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Jewish Studies in the Digital Age: 
Introduction 


In the past two decades we have witnessed the rapid increase of events, work- 
shops, sessions, and projects that, in one way or another, are positioned at the 
intersection of Jewish Studies and Digital Humanities (henceforth: DHJewish). 
This should hardly come as a surprise given the institutionalization of the digital 
humanities as a field and the ongoing proliferation of cultural heritage online. 
These developments have inevitably led to the need to confront the consequences 
of the so-called digital turn in all fields and disciplines of the humanities and 
social sciences. The international conference #DHJewish — Jewish Studies in the 
Digital Age, organized by the Centre for Contemporary and Digital History (C2DH) 
of the University of Luxembourg in January 2021, aimed to do so with regard to 
the field known as Jewish Studies.’ As in so many comparable events, the confer- 
ence highlighted the common methodological and epistemological challenges of 
much work in the digital humanities while, at the same time, seeking to probe 
the particular. After all, every field and (sub-)discipline has its own characteris- 
tics, preferred methodologies, and specific research questions for which certain 
digital approaches might be particularly suitable. This volume gathers a selection 
of revised and extended papers from the conference that, taken together, reflect 
the current state of what we call DHJewish. 

The question of how technology affects or might affect Jewish Studies in the 
present and future is not new In fact, attempts to chart how new technologies 
could alter research practices in Jewish Studies date back to at least 1969 when 
Henry Ekstein, during the Fifth World Congress of Jewish Studies in Jerusalem, 
presented a paper outlining a vision for a database that, in his view, would solve 


1 For more information and a link to the online conference archive with all session recordings, 
see: https://www.c2dh.uni.lu/events/dhjewish-jewish-studies-digital-age (all URLs in this intro- 
duction were last accessed on January 18, 2022, unless noted otherwise). All authors of this intro- 
duction were part of the conference program committee. 

2 This introduction is partly based upon: Gerben Zaagsma, “#DHJewish - Jewish Studies in the 
Digital Age," Medaon 12 (2018): 1-11; Gerben Zaagsma, Keynote lecture, “Exploring Jewish His- 
tory in the Digital Age," International Conference What's New, What's Next? Innovative Methods, 
New Sources, and Paradigm Shifts in Jewish Studies, POLIN Museum of the History of Polish Jews, 
Warsaw, October 3-6, 2021. See: https://doi.org/10.5281/zenod0.6631756. 


3 Open Access. © 2022 Gerben Zaagsma et al., published by De Gruyter. JBA This work is licensed 
under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
https://doi.org/10.1515/9783110744828-001 
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many ofthe information storage and retrieval challenges facing the field of Jewish 
Studies.’ Ekstein argued that, given the state of technology, it was possible to 
"establish a large scale data bank which would include information in the field 
of Jewish Studies, and to connect this bank by means of a telecommunication 
network to the most important research centers in this field all over the world."^ 
Ekstein's vision is reminiscent of early information organization systems, as 
developed and realized by the Belgian lawyer and proto information scientist Paul 
Otlet and his companion Henri Lafontaine.’ It also echoes developments in data 
communications networks that got underway in the 1960s and would eventually 
evolve into the Internet in the 1980s and World Wide Web in the 1990s. Ekstein, 
in sum, was effectively outlining what an online library for Jewish Studies could 
look like in an age when digital computing began to replace its analog siblings. 
Ekstein's remarks raise a number of broader questions, however. To begin 
with, they hint at the longue durée of the encounter between technology and the 
humanities, in techno-material terms as well as in thinking through its broader 
consequences for humanities research practices. Secondly, they highlight chal- 
lenges in the realm of information management that were by no means exclusive 
to the field of Jewish Studies alone and had been discussed by early informa- 
tion scientists for some decades already. And finally, Ekstein's remarks notwith- 
standing, the mechanical tools that he spoke of had in fact undergone signifi- 
cant change since the late 19th century and Jewish Studies scholars had keenly 
taken advantage of the possibilities they afforded ever since then. Indeed, while 
Ekstein might have been one of the first to reflect upon the overall state of Jewish 
Studies at the dawn of humanities computing, he clearly drew upon important 
developments preceding him. In fact, historians such as Samuel Oppenheim had 
been using the photostat since the 1920s to gather dispersed materials about their 
research topics. In the late 1930s and early 1940s geographers and demographers 
began to use punched card technology in the realm of Jewish Population Studies, 
following its entry in academic research in general, not only in the sciences, but 
also anthropology, literature, the social sciences, and (economic) history. Not 


3 Henry C. Ekstein, *How to Increase Effectiveness of Research in Jewish Studies," Proceedings 
of the World Congress of Jewish Studies (1969): 3-7. 

4 Ekstein, *How to Increase Effectiveness of Research in Jewish Studies," 5. 

5 W. Boyd Rayward, ed., International Organisation and Dissemination of Knowledge: Selected 
Essays of Paul Otlet (Amsterdam: Elsevier, 1990). 

6 Sophia Robison and Joshua Starr, Jewish Population Studies: Conference on Jewish Relations, 
New York, 1943 (New York: Conference on Jewish Relations, 1943). For a general overview of the 
emerging application of punched card technology in academia in the 1930s, see: G.W. Baehne, 
Practical Applications of the Punched Card Method in Colleges and Universities (New York: Colum- 
bia University Press, 1935). 
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much later, in the late 1940s, microfilming Jewish archives got underway, while 
work also began on automated machine translation, concordances, and word 
indexes." 

The latter's impact in Jewish Studies would soon be felt. On March 26, 1958, 
the Jesuit priest and scholar Roberto Busa and engineer Paul Tasman held a press 
conference at IBM's headquarters in New York to describe their literary data pro- 
cessing work on the Dead Sea Scrolls using punched card technology. Their pres- 
entation made headlines all over the world and Busa subsequently also presented 
the research in July 1961 at the third World Congress of Jewish Studies in Jerusa- 
lem.® This work was an offshoot of Busa’s work on the Index Thomisticus, begun 
in 1949 with the help of IBM, which would eventually earn him the reputation of 
“founding father" of the Digital Humanities, though this foundational myth has 
meanwhile been thoroughly questioned and scholars have recently also empha- 
sized the "female operatives" of his work and interrogated the "cultural, intellec- 
tual, and social conditions that shaped the earliest work in digital humanities."? 

In 1959, inspired by Busa, the recently founded Israel Academy of the Hebrew 
language decided to create a database for its Historical Dictionary, today's 
ma'agarim, one of the first of its kind for a major historical language.'? Around 
the same time, great strides were also made in the realm of Yiddish Studies and 


7 Dolores M. Burton, "Automated Concordances and Word Indexes: The Fifties," Computers and 
the Humanities 15, no. 1 (1981), https://doi.org/10.1007/BF02404370. 

8 Paul Tasman, Indexing the Dead Sea Scrolls by Electronic Literary Data Processing Methods 
(New York: IBM, 1958). Roberto Busa, “The Index of All Non Biblical Dead Sea Scrolls Published 
up to December 1957,” Revue de Qumrän 2 (October 1958): 187-98. For the global resonance see, 
for example, this set of articles on the Dutch Delpher website: accessed May 2, 2018, http://bit. 
ly/2mu7htf. Busa himself reminisced about the occasion in: Roberto Busa, “The Annals of Hu- 
manities Computing: The Index Thomisticus," Computers and the Humanities 14, no. 2 (1980): 
83-90, 85, https://doi.org/10.1007/bf02403798. For an elaborate analysis of this work see also: 
Steven E. Jones, Roberto Busa, S.J., and the Emergence of Humanities Computing: The Priest and 
the Punched Cards (Routledge: London, 2016), ch. 5. 

9 Steven Jones has complicated and contextualized this founding myth in his important work 
on Busa, see: Jones, Roberto Busa. See also, Melissa Terras and Julianne Nyhan, “Father Busa's 
Female Card Operatives," in Debates in the Digital Humanities 2016, https://dhdebates.gc.cuny. 
edu/read/untitled/section/1e57217b-f262-4f25-806b-Afcf1548beb5; Geoffrey Rockwell and Stefan 
Sinclair, “Tremendous Mechanical Labor: Father Busa's Algorithm," DHQ 14, no. 3 (2020), http:// 
www.digitalhumanities.org/dhq/vol/14/3/000456/000456.html. 

10 See: https://maagarim.hebrew-academy.org.il/. On the history of the project see: Zeev Ben- 
Haim, “On the Making of the Historical Dictionary of the Hebrew Language of the Academy of the 
Hebrew Language," Leshonenu 23 (1959): 102-23 [Hebrew]; R. Merkin, Z. Busharia, and E. Meir, 
*The Historical Dictionary of the Hebrew Language," Literary and Linguistic Computing 4, no. 4 
(January 1, 1989): 271-73, https://doi.org/10.1093/11c/4.4.271; Israel Yeivin, “Le dictionnaire histori- 
que de la langue hébraique," Meta 43, no. 1 (1998): 19-26. 
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Yiddish linguistics to expand the use of punched cards to enable the process- 
ing of linguistic data. In 1961 the linguist Uriel Weinreich managed to acquire 
funding to use *machine aids" for the creation of the Language and Culture Atlas 
of Ashkenazic Jewry (LCAAJ). A proposal from around 1960 in the LCAAJ archives 
shows a request for 200,000 cards and 500 hours of machine use at the newly 
created IBM Watson Scientific Laboratory at Columbia University.“ As Weinreich 
noted: *When one tries to visualize the editing of an atlas of many hundreds of 
maps, with up to 500 locations on each, it becomes clear what advantages are 
gained by the electronic filing and sorting of the data.” A few years later, in 1967, 
the Responsa Project launched, a computerized full-text retrieval system provid- 
ing access to, as Yaacov Choueka described it, “Rabbinical case-law documents 
spanning more than ten centuries."? 


1 From Humanities Computing to Digital 
Humanities 


As will be clear from this very sketchy overview, when Ekstein told his audience at 
the World Congress of Jewish Studies in 1969 that “the mechanical tools with which 
the researcher works have remained almost unchanged in the last five hundred 
years," he missed some very important developments. At that point in time, the 
late 1960s, humanities computing had firmly taken hold in the academy with its 
own conferences and events being organized as well as its own publications. It 
would continue to develop as the era of mainframe computing began to give way 
to micro- and personal computers which were introduced at universities from the 
late 1970s onwards. From 1973 onwards, a long series of 90 colloquia on humani- 
ties computing at the University of Tübingen demonstrates well the growing use of 
computers, especially the TUebinger System von TExtverarbeitungs-Programmen 
(TUSTEP),'* with a considerable impact on Jewish Studies in (printed) editions or 


11 "Description of Project," Box 236, Folder 13, Archives of the Language and Culture Atlas of 
Ashkenazic Jewry, Rare Book and Manuscript Library, Columbia University. 

12 Uriel Weinreich, “Machine Aids in the Compilation of Linguistic Atlases," American Philo- 
sophical Society Yearbook 1963 (1964): 622-25. 

13 Yaacov Choueka, “Computerized Full-Text Retrieval Systems and Research in the Humani- 
ties: The Responsa Project," Computers and the Humanities 14, no. 3 (November 1, 1980): 153-69, 
https://doi.org/10.1007/BF02403764. 

14 This powerful computer program for philological work, editions, synopses, concordances, 
and sophisticated page-layout is still used in many large projects more than 40 years after its 
first version. 
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synopses of the Mishnah, the Palestinian Talmud, and Hekhalot Literature and 
a concordance of the latter.” For the study of the biblical text, morphologically 
tagged versions of Hebrew, Greek, Syriac, Latin, and other texts were developed 
and aligned from the early 1980s and led to breakthroughs in the study of transla- 
tion and a series of five conferences on computing and the Bible. When the Inter- 
net and World Wide Web arrived on the scene in the early 1990s, as before, Jewish 
Studies scholars were quick to take advantage. Beginning in 1988, for instance, 
Martin Abegg famously used a personal computer to reconstruct the text of the 
unpublished Dead Sea Scrolls from the unpublished Preliminary Concordance, 
using a Macintosh SE that some nicknamed “Rabbi Computer."" His work led to 
their groundbreaking but controversial publication in 1991 and was quickly inte- 
grated into the first commercial program packages for Biblical Studies.!* 

During all this time, librarians and archivists eagerly discussed and explored 
the possible applications of new technologies too, including for instance on the 
pages of the American journal Judaica Librarianship.” They also actively sought 
to shape the future form of the Jewish documentary record. In October 1991, at a 


15 Michael Krupp, “Das Mischna-Editionsprojekt” (February 2, 1974), Michael Krupp, “Comput- 
er-unterstiitzte Zusammenstellung von textkritischen Apparaten. Erfahrungen bei der Vorberei- 
tung der Mischna-Edition" (July 2, 1977), Gottfried Reeg, “Spaltensynopse und Zeilensynopse als 
Darstellungsformen für kritische Texteditionen" and “Konkordanz zur Hekhalot-Literatur” (both 
June 30, 1984). Summaries accessible at http://www.tustep.uni-tuebingen.de/kolloq.html. The 
projects led to Michael Krupp et al., Mischna, 7 vols. (Jerusalem: Lee Achim, 2018); Peter Scháfer 
with Margarete Schlüter and Hans-Georg von Mutius, Synopse zur Hekhalot-Literatur. Texts and 
Studies in Ancient Judaism 2. (Tübingen: Mohr-Siebeck, 1981); Peter Scháfer with Gottfried Reeg 
et al., Konkordanz zur Hekhalot-Literatur, 2 vols. TSAJ 12-13 (Tübingen: Mohr-Siebeck, 1986 and 
1988); Peter Scháfer, Hans-Jürgen Becker with Gottfried Reeg et al., Synopse zum Talmud Ye- 
rushalmi, 7 vols.) (TSAJ 31, 33, 35, 82, 83, 67, 47) (Tübingen: Mohr-Siebeck, 1991-2001). 

16 Robert Kraft and Emmanuel Tov, “Computer-Assisted Tools for Septuagint Studies," Bulletin 
of the International Organization of Septuagint and Cognate Studies 14 (1981): 22-40. Robert A. 
Kraft, Emanuel Tov, and John R. Abercrombie, Computer Assisted Tools for Septuagint Studies 
(Atlanta, GA: Scholars Press, 1986). The first conference was published as Proceedings of the 
First International Colloquium Bible and Computer, The Text, Louvain-la-Neuve (Belgique) 2-3-4 
September 1985 (Paris: Champion; Geneva: Sladkine, 1986). 

17 Jones, Roberto Busa, 144—47, 145. 

18 In particular, Accordance Software, which started in 1988 and developed many supplemen- 
tary modules and complex search syntactical feature queries has a great impact in the study 
of ancient Judaism. See also: Johanna Sprondel, *Toward a Humanities of the Digital? Reading 
Search Engines as a Concordance," in The Making of the Humanities, vol. 3, The Modern Human- 
ities, ed. Rens Bod, Jaap Maat, and Thijs Weststeijn (Amsterdam: Amsterdam University Press, 
2014), 479—93, 483. 

19 Judaica Librarianship, last accessed May 2, 2018, https://ajlpublishing.org/. See, e.g., Diane 
Romm, "Retrieval of Judaica from Electronic Media: An Overview," Judaica Librarianship 8, no. 
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time when the PC had become a common tool for scholars and the Internet had 
arrived at universities, the Leo Baeck Institute organized a conference on “Prob- 
lems and Issues in Jewish Archives and Historiography in the Five New States of 
Germany." The meeting resulted in a plan to create a database of Jewish archival 
holdings in the states of the former German Democratic Republic (GDR) with the 
aim of enhancing access to these dispersed collections. In an echo of Ekstein's 
earlier vision, Robert Jacobs noted: *How very fortunate we are to live in a time 
when electronic capabilities enable us to provide bibliographic references that 
will allow subsequent generations of scholars to find the materials they seek with 
no more than a few keystrokes."?? The plan was an excellent example of the kind 
of “database” that Ekstein had in mind. Importantly, the idea was for an online 
catalog, not a digitization project. 

Soon, however, the publication of primary sources in the form of CD-ROMs 
and online databases and textual editions began, and it would lead to a veritable 
sea change during the 1990s, enabling far easier access to source materials and 
fundamentally changing scholarly information management practices. The impact 
of these developments on the field of Jewish Studies was a major focus of Heidi 
Lerner's Perspectives on Technology column in the Association of Jewish Studies' 
(AJS) Perspectives magazine, published on a regular basis between 2003 and 2011.” 
Lerner's 2002 article *New Technologies and Old Methodologies: Jewish Studies 
Research in the Digital Age" was probably one of the first to comprehensively 
address the possibilities that the digital turn offered for Jewish Studies scholars.” 
And then as now, as Lerner's column highlights, librarians and archivists were at 
the forefront of digital developments and pointed the way for humanities scholars. 

The early 2000s was also the period of the transition from humanities com- 
puting to what we now call *digital humanities," characterized by mass digitiza- 
tion, *big data," and the proliferation of new tools and new forms of knowledge 
dissemination. Digital humanities work in Jewish Studies in the past two decades 
has been made possible by a plethora of projects that bring together hitherto dis- 
persed, inaccessible, or fragile materials into digital archives and collections that 


1-2 (1993): 61-63 and Bella Hass Weinberg, "Judaica Librarianship in the Age of the Internet," 
Judaica Librarianship 9, no. 1-2 (1995): 3-5. 

20 Robert Jacobs, “Jewish Archival Holdings in the Five New States of Germany: Creating an In- 
ventory,” Judaica Librarianship 8, no. 1 (1994): 17-22, 22, https://doi.org/10.14263/2330 2976.1222. 
21 AJS, Perspectives on Technology, last accessed May 2, 2018, https://www.associationforjew- 
ishstudies.org/what-is-jewish-studies/digital-jewish-studies/perspectives-on-technology. 

22 Heidi Lerner, *New Technologies and Old Methodologies: Jewish Studies Research in the Dig- 
ital Age," Shofar: An Interdisciplinary Journal of Jewish Studies 20, no. 4 (2002): 81-95, https://doi. 
org/10.1353/sh0.2002.0073. 
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facilitate digital scholarship. If dispersion is the common denominator among 
many primary source collections for Jewish Studies, then the digital offers the 
possibility to virtually bring together dispersed materials. 

The issue of dispersion also relates to one of the key characteristics of the 
Jewish historical experience, migration and its transnational aspects. It is hardly 
surprising then that Jewish migration, especially East European, ranks promi- 
nently among digital resources. Among several well-known YIVO projects, such 
as the Encyclopedia of Jews in Eastern Europe, the exceptional Vilna Collections 
Project should be mentioned, which will offer unprecedented access to the largest 
archive on East European Jewish History in the world.? These projects exist side 
by side with a huge variety of national and local history projects, which often 
also exhibit transnational dimensions, such as Memoria Viva and the Jewish 
Diaspora Collection (part of the University of Florida's Digital Collections) which 
focus on Latin America, including the Caribbean.?^ National, regional, and local 
history projects comprise too many to mention. Some examples with clear trans- 
national dimensions include the South African Jewish Museum’s digital archive 
and the South Africa Jewish Rootsbank, Yiddish Melbourne, and DigiBaeck (the 
Digital Collections of the Leo Baeck Institute). More nationally focused projects 
include Key Documents of German-Jewish History and the Digital Library of the 
Italian Foundation Center for Contemporary Jewish Documentation.’ Other dig- 
itized collections, such as the Sephardic Studies Digital Collection at the Univer- 
sity of Washington aim to shed light on lesser known histories and languages of 
Sephardic Jews and counter the loss of history and culture of dispersed and grad- 
ually dwindling communities.” 

Next to migration, the Holocaust is probably the single most important topic, 
covered by a wide range of resources that include the European Holocaust Research 
Infrastructure (EHRI), Arolsen Archives, the USC Shoah Foundation Institute’s 
Visual History Archive Online (VHA Online), the German Memorial Book project, 
the Czech Holocaust Victims and Document Database, and the New York Public 
Library's Yizkor Book Collection.” As so many Holocaust-related sources and tes- 


23 See: https://vilnacollections.yivo.org/. 

24 See: https://yivoencyclopedia.org/default.aspx; https://www.yivo.org/ vilna-collections-pro- 
ject; https://mviva.org/; https://ufdc.ufl.edu/judaica. 

25 See: https://sajmarchives.com/; http://www.jewishroots.uct.ac.za/; https://www.monash. 
edu/arts/acjc/yiddish-melbourne; https://www.lbi.org/collections/digibaeck/. 

26 See: https://jewish-history-online.net/; http: //digital-library.cdec.it/cdec-web/. 

27 https://content.lib.washington.edu/sephardicweb/index.html. 

28 See: https://ehri-project.eu/; https://arolsen-archives.org/; https://vhaonline.usc.edu/; https:// 
www.bundesarchiv.de/gedenkbuch/; https://www.holocaust.cz/databaze-obeti/; https://digital- 
collections.nypl.org/collections/yizkor-book-collection#/. 
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timonies have become available online, scholarly attention has shifted in recent 
years towards studying the nature of Holocaust memory in the digital age.” 

In terms of types of sources being digitized, textual sources such as newspa- 
pers, manuscripts, and books, dominate. The National Library of Israel's Histor- 
ical Jewish Press website contains, at the time of writing, 625 newspapers in 20 
languages from all over the world, though with clear foci on the regions of East 
Europe, the Middle East, and North America, Jewish languages in Yiddish and 
Hebrew, and non-Jewish languages, such as Arabic, Polish, French, and English.?? 
The website provides a robust OCR and search function in four alphabets to allow 
greater use of and access to the materials, which enables the kind of compara- 
tive research that would require extensive research trips only a decade ago. Other 
newspaper databases complement these holdings, a notable resource to mention 
is Compact Memory which contains 424 periodicals in nine languages, the bulk 
being in German.** Compact Memory is part of the digitized Judaica collections of 
Frankfurt University Library which offers thousands of digitized books.” In this 
respect one should also mention important resources such as the Yiddish Book 
Center's Digital Yiddish Library, and the seminal International Collection of Digi- 
tized Hebrew Manuscripts project (KTIV).? 

The example of KTIV also highlights an important point. While many librar- 
ies are digitizing their collections, work in digital humanities goes well beyond 
posting digital facsimiles online. It is important to distinguish between schol- 
arly work that simply uses "digital" tools (which, it could be argued, applies to 
nearly everything, since something as basic as word processing uses the digital to 
express words on paper) and research that uses computational tools and digital 
methods to analyze and interpret digitized materials. 

This distinction is abundantly clear in the case of Hebrew manuscript studies. 
Over the past three decades, Hebrew manuscripts became the object of several 
studies in the field of computerized document analysis questions such as script 
classification, writer identification, layout segmentation, handwritten text recogni- 


29 See especially: Jeffrey Shandler, Holocaust Memory in the Digital Age: Survivors’ Stories and 
New Media Practices (Stanford, CA: Stanford University Press, 2017) and his chapter Digitizing 
Holocaust Memories in this volume. For Holocaust research in the digital age see the various 
articles in special issue no. 13 of Quest. Issues in Contemporary Jewish History in 2018. For the in- 
troduction: Laura Brazzo and Reto Speck, *Holocaust Research and Archives in the Digital Age: 
Introduction," Quest. Issues in Contemporary Jewish History 13 (2018): V-XIII. 

30 See: https://www.nli.org.il/en/discover/newspapers/jpress. 

31 See: https://sammlungen.ub.uni-frankfurt.de/cm/nav/index/title. 

32 See: https://sammlungen.ub.uni-frankfurt.de/judaica/nav/index/all. 

33 See: https://www.yiddishbookcenter.org/collections/digital-yiddish-library; https://web.nli. 
org.il/sites/nlis/en/manuscript. 
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tion, text to image alignment, and crowdsourcing.?^ Larger, accessible text corpora, 
stronger and more high-performance computers, and the (re)discovery of neural 
networks also caused a huge increase in the application of Natural Language Pro- 
cessing (NLP) on Semitic and Jewish Languages. Only in the last five years has 
performance reached production level for morphological, lexical and syntactical 
tagging, named entity recognition, stylometrics, author attribution, text reuse, 
topic modeling, and sentiment analysis even for historical texts.” Similarly, in 
the field of computer vision applied to manuscript studies, after the pioneering 
studies of the early years of the new millennium as, for example, exemplified in 
the join-discovery tool of the Friedberg Genizah project (FGP), the last five years 
have led to breakthroughs that allow mass applications of previously unthinka- 
ble scale.” The FGP, created in 2005 by and around Yaacov Choueka, assembles 
images, transcriptions, and metadata on more than 250,000 fragments with more 
than 600,000 images dispersed in more than 70 repositories worldwide.?? It was 
also one of the first projects worldwide in any discipline to apply cutting edge com- 
puter vision, natural language processing, and data-mining on a very large scale to 
historical documents.*? 

Alongside the study of Hebrew manuscripts, digital humanities methods 
have been applied to the study of the Jewish book and Jewish languages. One 
such approach is offered by Marienberg-Milikowsky focusing on the study of reli- 


34 One of the first computer vision projects applied to Hebrew manuscripts was Laurence Lik- 
forman-Sulem, Henri Maitre, and Colette Sirat, “An Expert Vision System for Analysis of Hebrew 
Characters and Authentication of Manuscripts," Pattern Recognition 24, no. 2 (1991): 121-37. See 
also Itay Bar Yosef, Klara Kedem, Its'hak Dinstein, Malachi Beit-Arie, and Edna Engel, “Classifi- 
cation of Hebrew Calligraphic Handwriting Styles: Preliminary Results," DIAL (2004): 299-305. 
35 E.g. Avi Shmidman, Shaltiel Shmidman, Moshe Koppel, and Yoav Goldberg, *Nakdan: Pro- 
fessional Hebrew Diacritizer," ACL (demo) (2020): 197-203. Compare Shuly Wintner, *Hebrew 
Computational Linguistics: Past and Future," Artificial Intelligence Review 21, no. 2 (2004): 
113-38 with Amit Seker, Elron Bandel, Dan Bareket, Idan Brusilovsky, Refael Shaked Greenfeld, 
and Reut Tsarfaty, "AlephBERT: A Hebrew Large Pre-trained Language Model to Start-Off Your 
Hebrew NLP Application With," CoRR abs/2104.04052 (2021), https://arxiv.org/abs/2104.04052. 
See the contributions by Shmidman and Waxman in this volume. 

36 See Roni Shweka, Yaacov Choueka, Lior Wolf, and Nachum Dershowitz, *Veqarev otam ehad 
el ehad: Zihuy ktav yad vetseruf qit’ei hagnizah beemtsa'ut mahshev," Ginzei Qedem 7 (2011): 
173-209. 

37 See, e.g., Daniel Stókl Ben Ezra, Bronson Brown-DeVost, Pawel Jablonski, Hayim Lapin, Ben- 
jamin Kiessling, and Elena Lolli, *BiblIA: A General Model for Medieval Hebrew Manuscripts and 
an Open Annotated Dataset," HIP@ICDAR 2021: 61-66. 

38 Yaacov Choueka, *Computerizing the Cairo Genizah: Aims, Methodologies and Achievements," 
Ginzei Qedem 8 (2012): 9-30. 

39 Shweka et al., “Veqarev otam ehad el ehad.” 
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gious Hebrew and Aramaic manuscripts and languages.*° The increase in such 
initiatives in recent years is evidenced by projects such as Tikkoun Sofrim, which 
uses AI to transcribe Hebrew manuscripts, Footprints, which traces the move- 
ment of Jewish books through time and place, HaMapah, which traces rabbinic 
networks based on printed responsa, and Geniza Scribes, which invites "the 
crowd" to transcribe manuscripts in Hebrew, Arabic, and other languages, from 
the Cairo Geniza.^ 

Digital humanities methods are also used to facilitate historical research 
beyond dispersed primary sources and against national borders. Some examples 
are projects such as Mapping Modern Jewish Cultures,* which explores space, 
time, and multilingual Jewish communities through the lens of cafés in urban 
environments; the Wikidata-based EHRI Ghettos”? authority list of Holocaust-era 
ghettos; Mapping Jewish LA,** which uses digital tools to highlight the diversity 
of Jewish experiences in one single place; and the Documenting Judeo-Spanish? 
project, which aims to document and provide access to Sephardic texts written 
with the Solitreo script. 

Surveying all ofthese projects, a broader picture emerges. As Gerben Zaagsma 
argued a few years ago, Jewish history presents specific challenges in the realm of 
information management due to its textual tradition, its diasporic nature, the — 
forced — migration of people, texts, ideas, and thus its transnational aspects. 
These characteristics, which can equally be applied to the field of Jewish Studies 
in general, are, in turn, reflected in both the state of Jewish heritage (dispersal 
of sources and objects) and its nature (multilingual, multiscriptual, and often 
textual). As a result, a key technological challenge for Jewish history in the digital 
age, and Jewish Studies more generally, is to work towards solutions for infor- 
mation retrieval and analysis from dispersed, multilingual, and multiscriptual 
sources.*° Much work in this direction has been done in the past three decades 
and it has allowed us to engage the transnational, interactional, inter-, intra- and 
cross-cultural dimensions of Jewish history in new ways. 


40 Itay Marienberg-Milikowsky, “Digital Research of Jewish Texts: Challenges and Opportunities,” 
in Textual Transmission in Contemporary Jewish Cultures, ed. Avriel Bar-Levav and Uzi Rebhun (New 
York: Oxford University Press, 2020), 15-25, https://doi.org/10.1093/0s0/9780197516485.001.0001. 
41 For digital editions of rabbinical texts see e.g. Chaim Milikowsky, “Scholarly Editions of 
Three Rabbinic Texts — One Critical and Two Digital,” in Advances in Digital Scholarly Editing, ed. 
Peter Boot et al. (Leiden: Sidestone Press, 2017), 137-46. 

42 https://richbrew.org/. 

43 https://portal.ehri-project.eu/vocabularies/ehri ghettos. 

44 http://www.mappingjewishla.org/. 

45 https://documentingjudeospanish.com/. 

46 Zaagsma, “Jewish Studies in the Digital Age." 
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Fifteen years ago Paula Hyman argued for the importance of comparative 
approaches to integrate what she called minority history into the history of major- 
ity populations." Around the same time Moshe Rosman sought to probe the chal- 
lenges that postmodernism posed for engaging with Jewish history, or Jewish 
histories, including questions of Jewish identity, periodization, and intercultural 
relations.** More recently, Zwiep and Wallet have suggested that “big data" might 
be one answer to some of these questions as its longitudinal character can help 
us explore the longue durée of Jewish history.^? If indeed some of the key tenets of 
Jewish historiography nowadays lie in studying inter- and intra-Jewish as well as 
Jewish/non-Jewish interactions, and in probing their fluid, constantly changing, 
and evolving nature in space as well as in and over time, then digitization and 
digital history are well placed to address the challenges involved. 

Online resources make comparisons within and between Jewish populations 
as well as between Jewish and non-Jewish populations easier than ever; they can 
help answer the crucial question of what was specific for which Jewish experi- 
ences, and for whose Jewish experiences. Computational techniques allow us to 
trace long-term trends in, for instance, newspapers and thus shifting discourses 
and concerns; network analysis can help chart the global migration of books and 
intellectual ideas and explore Jewish interconnectedness across borders; migra- 
tion and migrant experiences can be traced, explored, and compared through 
newspapers and other sources. Both the multiplicity and diversity of Jewish his- 
tories and experiences and the commonalities that unite and unify them are thus 
open for new and renewed explorations.?? 

As exciting as many of these digital projects are, though, a note of caution is 
in order. The continuity of digital humanities work is intimately bound up with 
sustainable digital preservation, and obsolescence is a serious concern. Just as 
the original data from the LCAAJ project could not be accessed digitally without 
re-digitizing the texts and OCRing them to allow searchability, many digital pro- 
jects are no longer accessible because of obsolete hardware or software. An inno- 
vative virtual exhibition allowing a walkthrough of the Braginsky collection, vir- 
tually turning pages of manuscripts in exhibition cases, is no longer accessible 
because the Flash product that it used is no longer supported. Link rot abounds, 


47 Paula Hyman, "Recent Trends in European Jewish Historiography," Journal of Modern History 
77 (2005): 345-56. 

48 Moshe Rosman, How Jewish Is Jewish History? (Oxford: Littman Library of Jewish Civilization, 
2007). 

49 Bart Wallet and Irene Zwiep, “Session 0.8.1/II: Humanities in the Mirror: Writing Jewish His- 
tory in a Digital Key,” EAJS Quadrennial Congress, Kraków, July 15-19, 2018. 

50 See also: Zaagsma, “Exploring Jewish History in the Digital Age.” 
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and many "cutting edge projects" of a decade ago are no longer accessible. There 
is still much work to be done on this front. 


2 Taking Stock 


To take stock of the myriad developments that were outlined above, a range of 
events have been organized over the past decade. The Center for Jewish History 
(CJH) ran a workshop in 2011 entitled From Access to Integration: Digital Technol- 
ogies and the Study of Jewish History, which sought to "explore in a systematic 
way new approaches to coordinating and integrating the digitization of Jewish 
historical sources around the world.”’' The workshop also aimed to connect 
Jewish Studies information specialists as a means of addressing the “challenges 
faced by many institutions in employing emerging technologies for the study of 
Jewish history." Perhaps ironically, and as a strong case in point that illustrates 
the abovementioned point about obsolescence, the conference website and other 
associated materials are no longer online, and the only live content remaining are 
the tweets using the hashtag #cjh-a2i.>” 

In his keynote lecture during the workshop, entitled "Digitization and Its 
Discontents for Jewish History," historian Anthony Grafton outlined the various 
ways in which the digital turn is affecting academia and academic libraries, 
including the possibilities of digitally reuniting dispersed material. He also noted 
that much of Jewish scholarship happens outside academic circles, meaning that 
open access to online Jewish resources had become highly important.” Unfortu- 
nately, the various blog posts devoted to the conference do not reveal what, if any, 
answers were formulated as to the new approaches and challenges mentioned 
above, or indeed provide much detail as to what these were in the first place?* A 
promised *white paper" following the conference unfortunately did not material- 
ize. On Twitter, though, we find some traces of the debate as it took place. Librar- 


51 From Arthur Kiron's introduction to: Anthony Grafton, "Digitization and Its Discontents for 
Jewish History," a talk delivered at the International Conference From Access to Integration: Dig- 
ital Technologies and the Study of Jewish History Center for Jewish History, New York, 2012, 2, last 
accessed June 12, 2014, http://www.cjh.org/CJHGraftonDigitization/ (no longer available online). 
52 Workshop, From Access to Integration: Digital Technologies and the Study of Jewish Histo- 
ry. An archived version of the conference website's About page can be accessed here: accessed 
November 30, 2021, https://web.archive.org/web/20120126155457/http://techconference.cjh.org/ 
about.php. 

53 Grafton, "Digitization and Its Discontents," 17-19. 

54 Last accessed March 20, 2018, https://16thstreet.tumblr.com/search/access-to- integration. 
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ian Deanna Marcum, for example, noted the international scope, long history, 
and multilingualism as distinct features of Jewish Studies, while also stressing 
that all fields share certain fundamental needs for infrastructure, governance, 
funding, selection, etc.” 

The CJH workshop was the start of a decade full of DHJewish events. In 2012 
Brown University organized a workshop “Ancient Religions, Modern Technology” 
devoted to the ways in which the digital humanities has or can change the study 
of religion in antiquity.° In 2013, the Institute for the History of the German Jews 
(Institut fiir die Geschichte der deutschen Juden) in Hamburg organized the work- 
shop Jüdische Geschichte Digital (Digital Jewish History).’” Taking stock of a wide 
variety of digital projects pertaining to German-Jewish history, the event led to 
the creation of the network Jiidische Geschichte Digital within the digital history 
working group of the Historikerverband, the German Historical Association.” 

Meanwhile, the annual AJS conference and the European Association of 
Jewish Studies (EAJS) conference, held every four years, began to include panels 
and workshops on Jewish Studies and Digital Humanities. Both the 2012 and 2013 
AJS conferences featured a THATCamp Jewish Studies and in 2014 a THATCamp 
was organized in Haifa, Israel? The Association of Jewish Libraries (AJL) held a 
digital humanities Roundtable at its 2014 meeting, and the keynote that year, by 
Emile Schrijver, was titled *Between Being-Wise and Not-Knowing-What-To-Ask: 
Jewish Librarianship and Digital Humanities." AJL’s journal, Judaica Librarian- 
ship, began publishing a regular review column of DHJewish projects in 2017.99 
In 2015, a conference entitled On the Same Page: Digital Approaches to Hebrew 


55 Last accessed March 20, 2018, https://twitter.com/search?l=&q=%23cjh-a2i&src=typd& 
lang-eng. 

56 See the call for papers: http://aramaicnt.org/2011/04/06/call-for-papers-ancient-religion- 
modern-technology-workshop/. 

57 Jüdische Geschichte Digital workshop, last accessed March 20, 2018, https://www.hsozkult. 
de/event/id/termine-22109, organized by Anna Menny and Miriam Ruerup. For a conference re- 
port see: Gerben Zaagsma, Tagungsbericht Jüdische Geschichte Digital. 13.06.2013-14.06.2013, 
Hamburg, in H-Soz-u-Kult, September 10, 2013, available online at: last accessed May 2, 2018, 
www.hsozkult.de/conferencereport/id/tagungsberichte-5011. 

58 Netzwerk Jüdische Geschichte Digital, last accessed April 5, 2018, http://www.historikerver- 
band.de/arbeitsgruppen/ag-digitale-gw/netzwerk-juedische-geschichte-digital.html. 

59 With regard to the 2012 AJS THATCamp see: Jeffrey Shandler, “From the President," AJS Per- 
spectives, Fall (2012): 3-4. For the individual camps: http://jewishstudies2012.thatcamp.org/; 
http://jewishstudies2013.thatcamp.org/; http://haifa2014.thatcamp.org/. 

60 See the introduction to the new column: Michelle Margolis, *JS/DH: An Introduction to Jewish 
Studies/ Digital Humanities Resources," Judaica Librarianship 20, no. 1 (2017), https://doi.org/ 
10.14263/2330 2976.1293. 
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Manuscripts took place at King's College London.* A follow-up EAJS round table, 
Turning the Page: Jewish Print Cultures & Digital Humanities, at the University of 
Amsterdam in February 2017, dealt with “early modern print cultures and the 
specific questions associated with them, e.g. regarding Jewish multilingualism, 
geographical space, the linking of various disparate library and archive collec- 
tions, and methods, scales and techniques of textual analysis.” From “Tablet” to 
“Tablet”: A Digital Humanities Workshop was held at the Institute for the History 
of the German Jews in Hamburg in September 20179 

Around the time the EAJS established a Digital Forum to engage with digital 
scholarship more comprehensively,” the 2018 EAJS conference in Kraków also 
featured two "digital" sessions; one on Humanities in the Mirror: Writing Jewish 
History in a Digital Key, which, by focusing on big data, aimed to *address the 
question whether DH corpora and methods will enable us to find a new common 
ground in the field of Jewish history" and reconsider its longue durée; the second 
on New Philologies: Hebrew Manuscript and Print Cultures in a Digital Key focused 
on editions and the application of machine learning.“ In January 2021, the online 
journal Reviews in Digital Humanities published a special issue on Jewish Digital 
Humanities and since 2022 the European Journal of Jewish Studies has included a 
section dedicated to DH and Jewish Studies. 

In short, in recent years there have been several efforts to understand and 
discuss how the digital turn has affected Jewish Studies and what its intersection 
with digital humanities looks like. The present volume gathers selected papers 
from the international online conference #DHJewish - Jewish Studies in the Digital 
Age, which took place in January 2021 and was organized by the Luxembourg 


61 Conference: On the Same Page: Digital Approaches to Hebrew Manuscripts, last accessed 
March 20, 2018, https://www.kcl.ac.uk/artshums/depts/trs/research/seminars/jewish/hebrew 
2015.aspx. 

62 EAJS Roundtable Report, “Turning the Page: Jewish Print Cultures & Digital Humanities,” 
University of Amsterdam, February 2017, available online at: last accessed September 7, 2017, 
https://www.eurojewishstudies.org/colloquia/eajs-programme-in-jewish-studies/eajs-roundta- 
ble-report-turning-the-page/. 

63 This workshop was initiated by Gerben Zaagsma and funded by the Rothschild Foundation and 
organized in cooperation with Miriam Ruerup at the Institute for the History of the German Jews. 
64 See: https://www.eurojewishstudies.org/digital-forum/eajs-digital-forum/. 

65 See the abstracts for both sessions here: https://www.eurojewishstudies.org/digital-forum/ 
eajs-conference-2018/. 

66 See: https://reviewsindh.pubpub.org/v2-n1; https://brill.com/view/journals/ejjs/ejjs-over- 
view.xml. 
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Center for Contemporary and Digital History (C?DH). The aim of the Luxembourg 
conference was to take stock of how the digitization boom of the last two decades, 
and the rapid advancement of digital tools to analyze data in myriad ways, have 
opened up new avenues for Jewish Studies research. It sought to answer the ques- 
tions of how digital developments can be harnessed to address specific questions 
and problems in the field, and what the current state of the art looks like. Sup- 
ported by an international program committee, it brought together more than 60 
scholars and heritage practitioners to discuss how the digital turn affects the field 
of Jewish Studies.® Importantly, the conference was not a stand-alone event but 
part of a bigger project that includes the online portal #DHJewish — Jewish Studies 
and Digital Humanities, launched in June 2022. #DHJewish offers a database of 
projects, a news and events section, blog posts, and links to various relevant bib- 
liographies as well as an online Zulip community.‘ 


3 Sections of the Volume 


The present volume contains papers based on presentations given during the 
2021 #DHJewish conference, and is divided in four sections: Collections, Spati- 
ality, Text, and Computational. This subdivision should be thought of in terms 
of where the main emphasis of the individual chapters lies, yet there are inter- 
sections in many ways. Thus, the first section, Collections, features chapters that 
take digital resources, and reflect upon their use in Jewish Studies, as their start- 
ing point. The chapters in the second section, Spatiality, all revolve around the 
affordances of spatial humanities approaches to understanding Jewish history 
and primary sources. The section Text features chapters that discuss various 
methods to engage with research focusing on and working with textual sources 
while the final section, Computational, includes chapters that all employ analyti- 
cal methods derived from computational linguistics. Importantly, the various sec- 
tions provide a mix of project-oriented case studies and research-oriented digital 


67 See: https://www.c2dh.uni.lu/events/dhjewish-jewish-studies-digital-age. The full archive 
of the conference with session recordings can be found here: https://www.morressier.com/o/ 
event/5fd2237e54bbb7f516f76f1b. 

68 The program committee consisted of: Michelle Margolis, Rachel Deblinger, Karin Hofmeester, 
Gabor Kadar, Amalia Levi, Anna Menny, Miriam Ruerup, Sinai Rusinek, Avi Shmidman, Daniel 
Stoekl Ben Ezra, Dov Winer, Gerben Zaagsma (Chair), and Irene Zwiep. See also: https://www.c2dh. 
uni.lu/events/dhjewish-jewish-studies-digital-age. 

69 See: https://dhjewish.org/. 
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scholarship. As such, the present volume also seeks to provide a glimpse of the 
various directions that digital humanities work in Jewish Studies can take. 


3.1 Collections 


The first section, Collections, features four chapters and starts with the confer- 
ence opening keynote lecture by Jeffrey Shandler, who takes the reader from 
the first recorded interviews with Holocaust survivors in Europe, conducted by 
US-based psychologist David Boder in the immediate aftermath of World War II, 
to today's new approaches in using immersive storytelling through new visuali- 
zation techniques. Shandler shows us that the use of all material, whether *born 
analogue" or “digital” depends on the interaction of the user with the interviews. 
While each technology enables us to experience a different type of encounter with 
eyewitnesses, these are always embedded in our own knowledge and research 
questions, which frame our engagements. 

Shandler's chapter is followed by that of Inna Kizhner, Melissa Terras, and 
their co-authors, who discuss the question of how Jewish culture is represented 
in museum collections. They provide us with a comparative survey of metadata 
on “Jewish” collections in the Metropolitan Museum of Art in the United States 
and the State Catalogue of Museum Collections of the Russian Federation that 
provide an entrance point into the complexities of international metadata stand- 
ards and address the importance of universal standards to provide equal access 
to collections. 

Following this analysis of how digital collections are constituted, Jakub 
Mlynář, Jiří Kocián, and Karin Hofmeisterova address the question of how 
users engage with them, specifically how search engines and search-query 
techniques shape our results. They present us with a small-scale study on how 
users at the Malach Centre for Visual History (CVHM) at the Charles University in 
Prague search in corpora of audiovisual Holocaust testimonies such as, among 
others, the Visual History Archive (VHA) and Fortunoff Video Archive for Holo- 
caust Testimonies. Through their case study they show how search engines and 
search-query techniques shape our findings and, ultimately, *tools" thus produce 
*data." 

Finally, Anna Bonazzi's essay delves into the challenges involved in index- 
ing the content of digital resources. She discusses work to semi-automate content 
indexing and analysis of Holocaust testimonies based on N-grams. Traditional, 
manually assigned keyword-based indexing of testimonies emphasizes verbal 
content terms such as noun-based facts, names, and historical references that we 
expect survivors to use. Bonazzi argues for an alternative indexing system that is 
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not limited to the identification of keywords to summarize content and presents a 
semiautomated DH approach based on N-grams. This approach allows us to iden- 
tify patterns and narrative structures that go unseen in traditional keyword-based 
indexing of the survivors' testimonies, including structural, non-verbal catego- 
ries like uncertainty, reticence, and emotional insistence on time references. 


3.2 Spatiality 


In the second section, Spatiality, four chapters illustrate the breadth of spatial 
approaches that are current in Jewish Studies. The essay by Sinja Clavadet- 
scher, Stefanie Mahrer and Stefanie Salvisberg discusses the forced migration 
of Jewish academics from Nazi Germany within the context of a broader research 
project about Switzerland and academic forced migrants from 1933 to 1950. The 
research uses nodegoat to map the expulsion of academics and resulting trans- 
national academic network in order to shed light on inner-institutional changes 
in respect to both academic staff and the academic standing of the research insti- 
tutions involved. 

Maja Hultman uses a GIS approach to debunk established historiograph- 
ical narratives of a spatially inscribed, dichotomized Jewish urban experience 
in early 20th-century Stockholm. The project interrogates the supposed division 
between integrated, Reform, northern-residing Jews and Eastern European, poor, 
orthodox, southern-residing Jews. In this project, GIS is used as a tool to facilitate 
quantitative analysis of primary sources over the city's topography, in tandem 
with qualitative sources. Hultman explores the role of this unique topography 
in shaping complex Jewish approaches to social integration, religious practices, 
and communal relations. 

Piergabriele Mancuso's essay highlights the centrality of archival collec- 
tions and research for developing virtual reality and 3D modeling reconstruc- 
tions. Using the Ghetto Mapping Project as a case study, Mancuso outlines the 
various and heterogeneous types of material used to assess not only the architec- 
tural features of the built environment, but also demographic and socioeconomic 
trends of the long presence of the Jewish community in the Florence ghetto, even- 
tually seeking to understand the politics of segregation. 

Finally, Daniel Stein Kokin presents a project that can simultaneously serve 
as a tool for research and pedagogy and as a political intervention that raises 
awareness of the long history of what we often take as a given - the “points” on 
our maps. Based upon maps and lists of settlements in the territory of Palestine 
and what today is Israel from the 1840s to the present, Stein Kokin shows how 
our contemporary topography is ultimately a temporary point — nekudah - on 
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the map which may have replaced another point before and might be followed 
by yet other points in the future. Ultimately, he problematizes the chronological 
matter-of-factness of what we see on our maps. 


3.3 Text 


The third section, Text, introduces a variety of methodological perspectives that 
can be brought to bear upon Jewish textual sources. Benjamin Lee reports on the 
use of machine learning techniques for extracting and analyzing visual content 
of early 20th-century Ladino newspapers. Lee scales up scholarship using the 
Newspaper Navigator tool to extract photographs, illustrations, maps, comics, 
and editorial cartoons alongside advertisements, and facilitate transnational 
analysis of the Sephardic Jewish experience. Beyond reporting on the actual work 
of creating and analyzing the dataset, Lee offers insights about interdisciplinary 
collaborations for digital humanities work in Jewish Studies, as well as reflec- 
tions on ethical considerations when applying machine learning techniques to 
Jewish cultural heritage collections. 

Abby Gondek's essay explores the question of how a network analysis and 
visualization tool such as nodegoat can help us overcome biased perceptions of 
how political agendas were shaped. Her case study of Henrietta Stein Klotz, assis- 
tant to Henry Morgenthau Jr., Secretary of the US Treasury between 1934 and 1945, 
shows how Klotz influenced and shaped Morgenthau's positions in response to 
the Holocaust. Her sophisticated multi-layered network analysis allows her to 
trace the gendered dimensions of interpersonal networks and the political influ- 
ence of actors that so often remain invisible. 

Zef Segal analyzes the contents of the periodical HaTzfira over a six-year 
period, before and after its transition from a weekly to a daily newspaper and 
shows what content changes in a newspaper can reveal about readers' interests 
and concerns. Segal uses topic modeling to show how local events and genera- 
tional change led to a shift from more scientific content to a widening focus on 
world events, politics, and Jewish-related news. 

Tatsiana Astrouskaya, finally, uses the well-known programming language 
R to study and analyze the correspondence of Russian refuseniks in their strug- 
gle to leave the Soviet Union. Focusing on the story of Ernst Markovich Levin, 
Astrouskaya analyzes the quantitative and the qualitative dimensions of his 
activities petitioning various authorities. Her argument-driven digital historical 
analysis shows how human and machine readings can complement each other 
also in relatively “small data" studies. 
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3.4 Computational 


The final and most specialized section of the volume, Computational, features 
four chapters that showcase some of the latest developments in computational 
techniques as applied to Jewish manuscripts. Turning to a print from the 18th 
century for a highly structured corpus of texts from early manuscripts, Luigi 
Bambaci reminds us that data for 21st-century computational analysis can be 
found centuries earlier. Bambaci’s chapter describes his work digitizing, parsing, 
and encoding the Kennicott Bible (1774-1776) to better analyze and understand 
textual elements in manuscripts of the Hebrew Bible up to constructing the stem- 
matic tree. One of the most difficult aspects of studying post-canonical Jewish 
texts is identifying biblical citations that are not exact. Historical writers of reli- 
gious texts often cited the Bible but might play around with the citation to fit a 
poem or the context at hand. 

Avi Shmidman presents an algorithm and tool to automatically detect bibli- 
cal paraphrastic quotations in a given text, a non-trivial task when they are short 
and not identical to the Hebrew text. This groundbreaking work has already been 
shown to be a tremendous tool in textual research in Jewish Studies. 

Daria Vasyutinsky, Jihad El-Sana and colleagues introduce their work in 
applying deep learning models to classify script types and sub-types in medieval 
Hebrew manuscripts. Incorporating, inter alia, the techniques and databases of 
Hebrew paleography their research project is part of a broader ongoing effort to 
develop algorithmic tools for processing historical documents. 

Finally, Joshua Waxman’s essay addresses the need for modern punctuated 
versions of classical Jewish texts which were often composed without punctua- 
tion. He describes an automatized system that was developed to create a punctu- 
ated version of the Hebrew and Aramaic of the Babylonian Talmud. This has the 
potential to assist greatly in Talmud use and study. 


As we hope this introduction has made clear, the field of Jewish Studies has 
always been shaped by the uptake of new technologies. It is therefore important 
to acknowledge the groundbreaking and pioneering work that has led to our 
present moment and can be traced back to at least the 1950s. As Jewish Studies 
have now firmly moved into the digital age, we hope that the essays in this volume 
will enhance critical reflection on the methodological and epistemological con- 
sequences for our field and encourage the further uptake of digital approaches to 
bring about its full potential. 
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Digitizing Holocaust Memories 


Abstract: Digital media have enabled expansive opportunities to engage the Hol- 
ocaust and its remembrance, especially by facilitating new forms of research. In 
particular, digitized audio or video recollections of survivors and other eyewit- 
nesses of the Holocaust attract a wide range of users. These resources, whether 
*born analog" materials that have been digitized or more recent materials that 
were created in digital mediums, pose particular challenges with regard to their 
preservation, archiving, cataloging, indexing, and accessing, even as they offer 
distinctive opportunities for engaging with Holocaust remembrance. Digitiza- 
tion not only facilitates accessing these materials but also enables their remedi- 
ation in new formats, shaping how these materials can be accessed through dif- 
ferent means of distribution. These factors, in turn, inform how these resources 
are studied. The scope of materials includes documentations of pre-Holocaust 
Jewish culture, including collections of prewar photographs and Yiddish lan- 
guage and folklore, many of which are in digital form. But the largest corpus of 
resources are found in institutional archives of Holocaust survivors’ memories, 
housing tens of thousands of life histories recorded in audio or video formats. 
These collections straddle a temporal boundary marked by both the impending 
loss of living witnesses to the Holocaust and the transition from the *video age" 
to the "digital age," making them especially resonant for studying Holocaust 
memory practices. At the same time, concerns for providing future generations 
with access to Holocaust survivors' life stories has engendered the use of new 
digital technologies - interactive holography and immersive virtual reality — to 
present these narratives. 


Keywords: archiving, digital media, Holocaust memory, Holocaust survivors, index- 
ing, life history, remediation, video 


Digital media have enabled expansive opportunities to engage the Holocaust 
and its remembrance. These include new forms of research, such as using non- 
invasive archeological survey techniques to create three-dimensional visualiza- 
tions of killing sites.’ Digital media also facilitate new means of imagining Holo- 
caust scenarios, whether through gaming or through role-playing videos posted 


1 See, e.g., “Recording Cultural Genocide and Killing Sites in Jewish Cemeteries," accessed Jan- 
uary 22, 2022, https://holocaustkillingsites.com/. 


3 Open Access. © 2022 Jeffrey Shandler, published by De Gruyter. KABAH] This work is licensed under 
the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
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on TikTok.? Among the most frequently encountered examples are digitized recol- 
lections of survivors and other eyewitnesses of the Holocaust. The range of users 
include high school students, genealogists, descendants of survivors, performing 
and visual artists, and museum visitors. These resources are also used by an array 
of scholars in the humanities and social sciences. In addition to those working in 
Jewish Studies are scholars of genocide, trauma, and memory practices, among 
other topics? 

Given this extensive and varied engagement, examining the nature of these 
resources' mediation offers insight into the impact of the digital turn in schol- 
arship and contemporary culture generally. Though digitized memories of the 
Holocaust include many written texts, this essay focuses on audio and visual 
examples. These resources pose particular challenges with regard to their preser- 
vation, cataloging, and accessing, even as they offer distinctive opportunities for 
engaging with Holocaust remembrance. 

Most of this material was not “born digital” but was converted from various 
analog mediums to a digital format. Therefore, digitizing these resources con- 
stitutes a remediation, which shapes how they are accessed through different 
means of distribution - whether compact disks, downloadable files, postings 
on websites or social media, or online streaming video. And this distribution, in 
turn, shapes how these resources are studied. The following discussion considers 
*born analog" examples first and then turns to more recent materials that were 
created in digital mediums. 

Digitization has made some of the earliest remembrances of Holocaust sur- 
vivors newly available, sometimes after decades of absence. Among these are 
the pioneering audio interviews conducted by American psychologist David 
Boder in 1946. To document survivors' wartime experiences "in their own voice," 
Boder traveled to Europe to conduct over 100 interviews with survivors in nine 
different languages, using a magnetic wire recorder (then state-of-the-art equip- 
ment).* In 1949, Boder published a book on these recordings, I Did Not Interview 
the Dead, and continued to work on them, but his interviews fell into obscurity 


2 On video games, see, e.g., Claudio Fogu, “Digitalizing Historical Consciousness," History and 
Theory 48, no. 2 (2009): 103-21. On TikTok, see, e.g., Nicole Froio, “We Asked TikTokers Why 
They're Pretending to Be Holocaust Victims," Wired, August 21, 2021, accessed August 24, 2020, 
https://www.wired.co.uk/article/tiktok-holocaust-pov. 

3 For a bibliography of this scholarship, see Victoria Grace Walden, “Reading about Digital Holo- 
caust Memory," accessed January 22, 2021, https://digitalholocaustmemory.wordpress.com/2020/ 
07/10/reading-about-digital-holocaust-memory/. 

4 See Alan Rosen, The Wonder of Their Voices: The 1946 Holocaust Interviews of David Boder 
(New York: Oxford University Press, 2010). 
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following Boder's death in 1961.° Then, due to growing interest in Holocaust sur- 
vivor memories in recent decades, the interviews were digitized and posted on 
a website by the Illinois Institute of Technology, Boder’s institutional home and 
now the keeper of some of his original recordings.° The website sorts the inter- 
views by various criteria, including the language and location of the interview 
and the gender, religion, and wartime experiences of the respective interviewee. 
In addition, the site facilitates the use of these recordings, which are not of the 
clearest sound quality, with transcriptions, translations, and annotations, plus 
background information on Boder and his project. Thus, the website appears to 
fulfill Boder's original aspirations to make these materials available for further 
study, something he could not realize in his lifetime. Yet the recordings' digitiza- 
tion transforms how they are encountered, and the website's additional content 
reflects the distinct interests of a later generation of scholars. 

At the same time that Boder undertook these interviews, folklorists started 
collecting various materials from Holocaust survivors. These early efforts include 
Yiddish folksongs documented by Ruth Rubin from informants in Canada and 
the United States, beginning in 1946, and by Ben Stonehill, who made over 1,000 
wire recordings of songs performed by Holocaust survivors in New York in 1948." 
Some songs pertain specifically to the Holocaust, such as those composed and 
sung in wartime ghettos. But these collections documented the wider repertoires 
of Jewish folksingers, as part of a larger range of efforts to preserve something of 
the cultures decimated and dispersed by the genocide. Both Rubin's and Stone- 
hill’s collections are now online. Their websites catalog the songs according 
to a thematic taxonomy and provide translations of lyrics and information on 
the singers and sources when known. As with Boder's interviews, these online 
resources are remediations, shaped by the input of scholars and technicians, to 
present resources to a new generation of users, for whom the material has new 
value. 

Efforts to document prewar Jewish culture include several projects focused 
on Yiddish language and culture, some of which have been digitized. The earliest 
of these is the Language and Culture Atlas of Ashkenazic Jewry, initiated by lin- 
guist Uriel Weinreich at Columbia University in the 1950s. Audiotaped interviews 
with over 600 native speakers of Yiddish from across northern Europe have subse- 
quently been digitized and posted online, as have scans of this project's extensive 


5 David P. Boder, I Did Not Interview the Dead (Urbana: University of Illinois Press, 1949). 

6 "Voices of the Holocaust," accessed January 22, 2022, http://voices.iit.edu/. 

7 *The Ruth Rubin Legacy Archive of Yiddish Folksongs," accessed December 24, 2018, https:// 
exhibitions.yivo.org/exhibits/show/ruth-rubin-sound-archive/home; “The Stonehill Jewish Song 
Collection," accessed June 3, 2018, http://www.ctmd.org/stonehill.htm. 
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paper archive. More recent projects documenting native Yiddish speakers born 
in prewar Europe include the Archives of Historical and Ethnographic Yiddish 
Memories, known by the acronym AHEYM, which means *homeward" in Yiddish. 
This project was launched in 2002 by linguist Dov-Ber Kerler and historian Jeffrey 
Veidlinger at Indiana University. AHEYM includes video interviews with over 300 
individuals identified as *Eastern Europe's last native speakers of Yiddish," all 
born before World War II. The project's interview topics are wide-ranging, includ- 
ing "linguistic ... data, oral histories of Jewish life in Eastern Europe, Holocaust 
testimonials, musical performances ..., folk narratives, ... reflections on contem- 
porary Jewish life in the region, and guided tours of [local] sites of Jewish mem- 
ory."? Weinreich's Atlas, which made early use of computers to correlate data, 
focused on gathering information for research on the historical foundation of 
modern Yiddish dialects. AHEYM, by contrast, is more wide-ranging in content, 
while it centers on the recollections of Holocaust survivors who returned to their 
prewar homes in provincial East European towns as embodiments of a vestigial 
“shtetl” culture. 

This impulse to document what is often called the “lost world” of East Euro- 
pean Jewry as an act of Holocaust remembrance includes collecting prewar pho- 
tographs. Though made before the war, these images acquire new significance as 
records of lives destroyed by genocide. Some photography collections that have 
been digitized and placed online focus on a particular locale - for example, Mike 
Marvin's assembly of photographs from Szczuczyn, Poland, where his grand- 
father, Zalman Kaplan, operated the town's only photography studio from the 
1890s though the 1930s.*° Other collections are more wide-ranging, having been 
gathered and digitized by research organizations. Among these is “People of a 
Thousand Towns," an indexed catalog of the extensive holdings of photographs 
of Jewish life in prewar Eastern Europe presented online by the YIVO Institute 
for Jewish Research. This project began in 1981, when the institute placed 15,000 
digitized images on a videodisc - then a brand-new technology - linked to a 
computer catalog, which, at the time, could only be used on site at YIVO's New 


8 "Language and Culture Atlas of Ashkenazic Jewry," accessed June 3, 2018, http://library.colum- 
bia.edu/locations/global/jewishstudies/lcaaj.html. For audio recordings from the Atlas, see also 
“EYDES: Evidence of Yiddish Documented in European Studies," accessed June 3, 2018, http:// 
www.eydes.de/. 

9 "AHEYM: The Archives of Historical and Ethnographic Yiddish Memories," accessed June 3, 
2018, http://www.iub.edu/-aheym/. 

10 “The Zalman Kaplan Collection of Pre-war Photos of Szczyczyn," accessed January 22, 2021, 
http://www.szczuczyn.com/kaplan.htm. See also Louis D. Levine, ed., Lives Remembered: A 
Shtetl through a Photographer's Eye (New York: Museum of Jewish Heritage, 2002). 
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York headquarters." In 2002 the photographs and catalog were placed online and 
made publicly accessible.'? This move is typical of the evolving of digitized mate- 
rials; as technology advances, the possibilities for accessing resources expands. 

An essential component of digitizing these photographs for researchers to 
access is the creation of databases of the images' metadata. This information, 
extrinsic to the photographs themselves, provides the point of entry to their use, 
shaping how they can be searched. Databases of photographs typically include 
generic descriptions of their content, as well as specific information, when 
known: their locations and dates, the names of their subjects and photographers. 
Less often recorded, but no less valuable, is information about photographs' 
provenance, given their sometimes charged trajectory during and after the war. 
Including this information distinguishes the collecting project known as "I ciggle 
widze ich twarze / And I Still See Their Faces.” It was initiated in the 1990s by Fun- 
dacja Shalom, an organization dedicated to promoting Jewish culture in Poland. 
The project gathered photographs of Jews in the possession of Polish citizens, 
yielding hundreds of images. Many came from people who did not know the iden- 
tity of these photographs’ subjects — only that they were Jews. Therefore, “And 
I Still See Their Faces" derives much of its value not from the information that 
the images provide about Polish Jewish life, but from the disrupted history of 
the physical photograph. Sometimes this history is evinced by the photograph's 
damaged condition, thereby materializing the haunting power of the loss of 
people and their culture.” 

The photographs and recollections that Fundacja Shalom gathered for “And 
I Still See Their Faces" have been presented in a book, exhibition, and online.“ 
More recently, the project expanded and transformed its digital presence on a 
website called Zydzi Polscy (Polish Jews). In addition to providing materials the 
Foundation had collected, the website invites visitors to post their own memo- 
ries of Jewish relatives as well as letters, diaries, or photographs. In this way, the 
Foundation asserts, the website — and the story of Polish Jewry - will not *have a 
last page." Zydzi Polscy explains that it *is open for everyone, history lovers and 


11 Jeffrey Shandler, “People of a Thousand Towns': YIVO's Videodisc Project,” Jewish Folklore 
and Ethnology Review 10, no. 1 (1988): 18. 

12 “People of a Thousand Towns,” YIVO Institute, accessed January 22, 2021, http://yivol000towns. 
cjh.org/search tips.asp. 

13 And I Still See Their Faces: Images of Polish Jews / I ciągle widze ich twarze: Fotografia Żydów 
polskich (Warsaw: Fundacja Shalom, 1996). 

14 "AndIStill See Their Faces," Fundacja Shalom, accessed January 22, 2021, http://shalom.org. 
pl/en/i-ciagle-widze-ich-twarze-2/. 
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witnesses, scientists and research centers, who would like to join the ... effort 
to recreate the world of Polish Jews." This “continuously updated material," 
renders Żydzi Polscy “a living panorama of the Jewish community in Poland." 
This internet platform, interactive and renewable, fosters a virtual community 
of people interested in the topic of Polish Jewry and models cultural liveliness 
for the present activity of memory work, as this undertaking strives to animate a 
“lost world.” 

Institutional collections of Holocaust survivors’ memories house tens of thou- 
sands of their personal histories, many now available in digital form. These efforts 
began in the war’s immediate aftermath and continue to the present, undertaken 
in a variety of mediums. For example, Yad Vashem started archiving survivors’ 
written accounts in the late 1940s, shortly after the State of Israel’s national center 
of Holocaust commemoration and documentation had been established. Yad 
Vashem began recording survivors on audiotape in 1954 and videotape in 1989.15 
Overall, dozens of projects to make recordings of Holocaust survivors have been 
undertaken in multiple countries in the Americas and Europe, as well as in Israel 
and Australia. The United States Holocaust Memorial Museum currently houses 
over 70,000 audio and video interviews with Holocaust survivors and eyewit- 
nesses, whether online or in its Washington DC headquarters. These recordings 
have been gathered from almost 100 different sources, ranging from large-scale 
collecting projects to local community efforts and individual undertakings." 

Digitization not only facilitates accessing these interviews; it also enables 
their remediation in new formats. For example, the University of Southern Cal- 
ifornia Shoah Foundation recorded over 51,000 video interviews, mostly in the 
mid- to late 1990s, for its Visual History Archive (hereafter, VHA).'? The largest 
such collection by far, its holdings have been remediated in a variety of formats. 
Especially significant is an online platform the foundation launched in 2009 
called IWitness, which provides a "guided exploration" of over 1,500 videos in 
the VHA, primarily for use in American secondary schools (most interviews are in 
English). IWitness explains that it combines learning "first hand from survivors 


15 "Zydzi Polscy," accessed January 22, 2021, http://www.zydzipolscy.pl/. 

16 See “About the Yad Vashem Archives," accessed November 20, 2012, www1.yadvashem.org/ 
yv/en/about/archive/about. archive whats in archive.asp; and Yad Vashem Collection of Tes- 
timonies, "Listing of the Record Groups in the Yad Vashem Archives," 12, accessed November 20, 
2012, wwwh1.yadvashem.org/yv/en/about/archive/pdf/list of record groups.pdf. 

17 United States Holocaust Memorial Museum, Collections search: audio visual oral histories, 
accessed January 21, 2021, https://collections.ushmm.org/search/?f%5Bf_audiovisual%5D% 
5B965D-testimony&q-oral- histories. 

18 “USC Shoah Foundation, Visual History Archive,” accessed January 21, 2021, https://vhaonline. 
usc.edu/login. 


Digitizing Holocaust Memories — 31 


and witnesses of the Holocaust” with “participatory” use of the videos through 
a “built-in online video editor,” which enables students to “build [their] own 
video projects.” Like Zydzi Polscy, IWitness uses the internet to enable interac- 
tive engagement with digitized resources — not by contributing to its holdings but 
by inviting their sampling. The Shoah Foundation explains that IWitness offers 
students the opportunity to learn about the impact of the Holocaust on individ- 
ual lives while also learning “important digital media skills, including ... ethical 
remixing, that will prepare [students to become] digital citizen[s] in the 21st 
century.” Through this platform, the Foundation promotes its videos as moral 
touchstones, even as it invites students to sample the collection to create new 
cinematic narratives. IWitness characterizes the exercise as an opportunity for 
ethical instruction in its own right, extending the Foundation's conviction that 
viewing the Archive's interviews has a morally galvanizing power. 

One of the first major video archives of Holocaust memories established in 
the United States recently initiated a different kind of remediation of its digitized 
holdings. Begun in the late 1970s, the Fortunoff Video Archive for Holocaust Tes- 
timonies recorded over 4,000 videos, now housed at Yale University.”° Starting 
in 2019, the Archive has produced a series of podcasts titled “Those Who Were 
There: Voices from the Holocaust." The series presents excerpts from audio tracks 
of video interviews, supplemented with narration and musical scoring. Similar to 
IWitness, this remediation addresses a moral imperative. The podcasts' website 
explains that “it is our duty to listen and share, so that the horrific events of the 
Holocaust do not fade from memory.””! Also like IWitness, “Those Who Were 
There" appears to have embraced newer media in order to engage younger audi- 
ences. As one reviewer of the project observes, it “applies a twenty-first century 
social media strategy to reach out to a generation ... weaned on Instagram and 
Snapchat."? The pedagogical aims of “Those Who Were There" are evident in 
supporting materials on its website: interview transcripts, vintage photographs 
and documents, additional readings and historical background on the Holocaust. 

By dint of the interviews' digital format, their remediation in these projects 
entails both reducing and supplementing the original recordings: in IWitness, 


19 "About/IWitness," accessed January 3, 2015, http://iwitness.usc.edu/SFI/About.aspx. 

20 "Fortunoff Video Archive for Holocaust Testimonies," accessed January 21, 2021, https://fortun- 
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students select clips from VHA videos; each podcast episode of *Those Who Were 
There" presents up to 25 minutes of audio culled from longer interviews. At the 
same time, both projects integrate additional elements: music and narration are 
added in the podcasts, and the IWitness editing platform enables students to 
integrate these and other elements into their videos. As these examples demon- 
strate, the digital remediation of works of Holocaust memory can have epistemo- 
logical consequences. The original resource, itself a mediation of recollections, 
can be segmented and incorporated with other material, to create new works of 
remembrance. 

Similarly, the means of accessing digitized works of Holocaust memory through 
cataloging, transcribing, translating, and tagging can inform how these resources 
are engaged, as exemplified by the Shoah Foundation's Visual History Archive.” 
Among the many projects undertaken to record interviews with survivors and eye- 
witnesses of the Holocaust, the VHA stands out on several counts: not only as the 
largest such collection, but also as the most diverse, with interviews conducted in 
56 countries and 32 languages, encompassing a range of Holocaust survivors and 
eyewitnesses. In recent years the Foundation has expanded the Archive's holdings 
to include interviews with Holocaust survivors collected by other organizations as 
well as films and videos of survivors or eyewitnesses of other atrocities. These range 
from the Armenian genocide, which began in 1915, to Rohingya survivors of mass 
violence in Burma in 2017. The VHA is also the most widely accessible major collec- 
tion of Holocaust interviews, available in a variety of formats, including documen- 
tary films, educational DVDs, the aforementioned IWitness platform, and an online 
archive housing the entire collection, which facilitates searching this extensive body 
of material with an elaborate indexing system. As a result of its pioneering work 
on indexing and cataloging videos, as well as creating a digital platform for their 
viewing online, the Shoah Foundation holds multiple patents on its technological 
innovations.” 

The scope of the VHA's index has also evolved as the archive's holdings have 
expanded - adding, for example, indexing terms specific to the interviews with 
survivors or eyewitnesses of atrocities other than the Holocaust (e.g., "Armenian 
culture," *attitudes toward Hutu," “foot binding"). In addition, indexing terms 
that are relevant to multiple atrocities — such as “political identity,” “post-conflict 
justice,” and “trauma-related dreams" — now enable researchers to make compar- 
isons among these accounts of different historical events. The index indicates as 
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well when search terms have been added to the VHA's search mechanism, evinc- 
ing an evolving understanding of what topics researchers may wish to explore. 

The VHA is especially noteworthy because it was created on the cusp of the 
transition from analog to digital media. Since its inception, the Shoah Foundation 
has grappled with the possibilities and challenges posed by an expanding array 
of new technologies. The Foundation recorded its interviews in the mid-1990s 
on Betacam SP format analog videotapes, then the broadcast industry standard. 
Concurrently, digital media and the internet were coming into widespread use, 
fostering new possibilities for creating and engaging with information. Digital 
and online technologies soon became central to the Archive's agenda, facilitating 
the preservation, indexing, and accessibility of its collection. 

Thus, the VHA straddles both the temporal boundary marked by the impend- 
ing loss ofliving witnesses to the Holocaust and the transition from what has been 
called the “video age" to the “digital age." The Shoah Foundation set out to use 
state-of-the-art technology to preserve the memories of survivors as their passing 
approached. Yet its archive evinces how mutable memory practices are - how 
quickly the mediums employed have changed, plus the fact that newer mediums 
are less stable than older ones. In this respect, the VHA exemplifies a challenge 
that Holocaust remembrance has confronted from the start. As Holocaust studies 
scholar Alan Rosen notes, Holocaust memory projects have always “been bound 
up ... with technological advance and obsolescence.”” 

Digitizing the Shoah Foundation’s videos facilitated their cataloging and, es- 
pecially, their indexing. Like other projects discussed above, digitization enables 
searching this large collection of videos by dint of their metadata. Moreover, the 
VHN' index expands greatly the possibilities for searching within and across 
interviews, thereby transforming how they can be used. The Shoah Foundation 
initially decided to index rather than transcribe the videos, as a more expedient 
undertaking and a more useful aid to researchers.?® As taping interviews pro- 
ceeded, the Foundation began developing a matrix of search terms specific to the 
Archive. 

Like the VHA itself, the archive's index is vast, with over 50,000 search terms. 
They are keyed to particular interview segments during which interviewees 
discuss these terms. In addition to looking for names of particular people, places, 
events, or institutions, VHA users can search the content of videos through a grad- 
uated taxonomy of subject terms. The index facilitates both expanding and con- 
straining searches. Its subject terms are generalized, enabling users to cross-ref- 
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erence interviews conducted in various locations and languages and sometimes 
across different genocides. Each search term can also be narrowed by the criteria 
of interview language, gender of interviewee, and "experience group" (such as 
Roma survivors or liberators). Searching coincident terms (for instance, finding 
interviews that discuss both “early personal aspirations" and “school antisemi- 
tism") can further winnow interview selections." Therefore, exploring this exten- 
sive body of material relies to a great deal on the archive's choice of index terms 
and how individual interviews are tagged with these terms. 

Though the Shoah Foundation originally decided against transcriptions, it 
has recently begun to add them as a searchable resource that complements the 
index. The Foundation explains that “the transcripts ... appear on the screen as 
interviewees are talking so there will not be any loss of nuance of expression or 
paralinguistic cues.””* The on-screen juxtaposition of video, indexing terms, tran- 
scription, plus mapping of locations mentioned, situates the recordings within a 
complex of search tools and supporting materials. This presentation also serves 
to remind viewers that the individual interviews they watch are part of a larger 
project. 

At the same time that it has expanded the scope of the VHA, the Shoah Foun- 
dation continues to explore new digital technologies for documenting and pre- 
senting Holocaust memories. Starting in 2012, the Foundation collaborated with 
University of Southern California's Institute for Creative Technologies to develop 
New Dimensions in Testimony. This project strives to enable people “to interact 
personally with testimony through holographic display" of a Holocaust survi- 
vor. Questions people ask the animate image of the survivor trigger prerecorded 
responses, by means of a "technology called Natural Language Understanding." 
New Dimensions in Testimony both extends and alters the Shoah Foundation's 
mission to document and disseminate survivors' life stories. Interactive hologra- 
phy shifts the mode of engaging survivors' memories from listening to and watch- 
ing the VHA's “talking-head” video interviews to interacting with three-dimen- 
sional, full-body apparitions of survivors. 

Interactive mediums are widely hailed for enhancing audience encounters 
with information and thereby empowering them to further action. According to 
the Institute for Creative Technologies, this project's goal is to offer people “sim- 
ulated, educational conversations with survivors though the fourth dimension 
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of time. Years from now,” the institute writes, “long after the last survivor has 
passed on, the New Dimensions in Testimony project can provide a path to enable 
young people to listen to a survivor and ask their own questions directly.” At the 
same time that its creators champion the project’s use of “the latest technolo- 
gies,” they also assert that it “advances the age-old tradition of passing down 
lessons through oral storytelling."?? 

Notwithstanding these claims, there are telling differences between convers- 
ing with an actual person and the simulation this project offers. Unlike video 
interviews of survivors, the technology used for New Dimensions in Testimony 
requires that the exchange between interviewer and informant be atomized into 
discrete questions and answers. Thus, Holocaust Studies scholar Rachel Baum 
argues, an interactive holograph of a survivor is, in effect, “the visual representa- 
tion of a database” of information.?? Therefore, the exchange of questions and 
answers with the holograph is transactional rather than developmental. This 
alters the nature of the information provided as well as how it is received. Com- 
munications scholar Ekaterina Haskins argues that, when using interactive 
digital media, *the audience no longer acts as a consumer of a linear story [but] 
takes part in the experience by making choices.'?' Yet the holograph’s prere- 
corded life story in units of information, intended to answer individual questions, 
precludes the possibility of asking follow-up questions that probe or develop the 
survivor's narrative, which can be done when interviewing a living informant, as 
documented in audio and video interviews. 

This simulation of an actual conversation is limited both by the technology's 
occasional failure to offer apposite answers to questions and by the fact that the 
holograph cannot respond to a question that was not anticipated when recording 
the survivor's database of responses.” Not hearing answers to one's questions is, 
of course, a limitation that viewers of video interviews with survivors might also 
experience. However, those recordings are manifestly closed works, document- 
ing an interaction one can audit but not enter. By contrast, the interactive holo- 
graph purports to provide the equivalent of a conversation with a living survivor. 
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Video interviews provide an unfolding narrative, enabling the opportunity to 
observe the dynamics of memory. As literature scholar James Young notes, video 
foregrounds the process of remembrance by recording *both the witnesses as they 
make their testimony and the understanding and meaning of events generated 
in the activity of testimony itself." Consequently, observers of these videotapes 
*witness ... the making of testimony."? By contrast, interactive holographs offer 
multiple potential points of entry to the survivor's life narrative, prompted by the 
asker's particular initiative. One need not, for example, start at the beginning 
of the survivor's life or of World War II, as some video interviews do, in keeping 
with the protocols for their respective projects. This open-ended interaction with 
the holograph relies on askers' knowledge of what to ask, as they must initiate 
each exchange. The challenge this interactive format poses to users is not limited 
to holography. Media scholar Anna Reading observes that visitors using interac- 
tive computer displays in the Simon Wiesenthal Center's Multi-media Learning 
Center in Los Angeles soon lose interest after making random choices of topics 
to pursue and generally “prefer to make interactive choices based around ‘what 
they already know.’” Interactivity, she notes, “is not the same as agency.”** In the 
case of the holographs, the technology's novelty predominates. Engaging with it 
can prove enticing or daunting, as its presentation of a virtual person may seem 
either a wondrous or an unsettling phenomenon. 

Given this challenge, the Shoah Foundation recently announced a more 
directly pedagogical approach to engaging with survivor holographs. In an activ- 
ity called “A Conversation with Pinchas Gutter," students first learn about this 
survivor's background and practice interviewing skills; then they are invited to 
ask questions of Gutter's "interactive biography." The pedagogy for this project is 
equally concerned with the process of interrogation as it is with the information 
students acquire from engaging with the holograph. As the Foundation explains, 
“Students will learn the techniques for having a conversation with a survivor and 
how to construct questions appropriately to elicit personal, historical and univer- 
sal thematic responses.” 

In yet another use of new media technology by the Shoah Foundation, Gutter 
is the subject of “The Last Goodbye,” described as an “immersive virtual reality 
testimony experience” that “represents unprecedented advances in storytelling 


33 James E. Young, Writing and Rewriting the Holocaust: Narrative and the Consequences of Inter- 
pretation (Bloomington: Indiana University Press, 1988), 159, 171. 

34 Anna Reading, “Digital Interactivity in Public Memory Institutions: The Uses of New Technol- 
ogies in Holocaust Museums,” Media, Culture & Society 23 (2003): 79, 73. 

35 USC Shoah Foundation, “A Conversation with Pinchas Gutter,” accessed January 22, 2021, 
https: //iwitness.usc.edu/sfi/Activity/Detail.aspx?activityID=5391. 


Digitizing Holocaust Memories — 37 


through technology." In this 20-minute presentation, which debuted in 2017, 
users don virtual reality headsets to simulate accompanying Gutter on his "final 
return" to the former death camp at Majdanek, where he was imprisoned and 
his parents and sister were murdered.” The technology employed in “The Last 
Goodbye" provides a distinct mode of accessing survivor storytelling, with its 
own assets and limitations. Here, connecting with the survivor is not interactive. 
Unlike *New Dimensions in Technology," *The Last Goodbye" does not proffer a 
conversation with Gutter. Instead, the project's structure follows his monologue, 
organized by the sites visited. “The Last Goodbye" resembles survivors recount- 
ing their wartime experiences to tour groups visiting death camps. It thereby 
constitutes an effort to use new technology to simulate another waning practice 
of face-to-face encounters with survivors. Thus, when “The Last Goodbye" was 
presented at the Museum of Jewish Heritage in New York, the museum explained 
that, “as Pinchas recounts his experiences, you walk alongside him - seeing 
what he sees, hearing what he hears, and learning as he guides you through an 
account of his own history.”?” 

As a consequence of this cascade of new digital technologies, students, teach- 
ers, museum goers, and others are offered multiple approaches to engage with the 
life stories of Holocaust survivors. Each technology facilitates a different kind of 
encounter, with its own possibilities and limitations. The fact that some survivors, 
such as Gutter, offer their personal histories in multiple formats provides oppor- 
tunities to reflect on the impact of each medium and format for recording and 
sharing a life story on both the survivors and their audiences. 


* 


Beyond their role as conveyers of information about the Holocaust and its remem- 
brance, these many resources are noteworthy in their own right as phenomena 
of digital humanities, especially for scholars who are interested in this evolving 
interrelation of new technologies with the documentation and study of the past. 
In addition, all users of digital materials such as the aforementioned examples 
can benefit greatly from observing that their mediation and remediation are 
intrinsic to these resources: 

First, by examining the protocols for how the materials, whether analog or 
digital, were collected: what mediums were used and how were the possibilities 
and the value of these mediums understood. Consider, for example, decisions 
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to make video recordings of Holocaust survivors as talking heads against a blank 
background, in their homes, or in environments associated with their wartime 
experiences; the decision to include photographs, documents, or other personal 
objects in the interview and how they are incorporated. Such decisions as these 
shape how the survivors' narratives are mediated, as the narratives themselves 
are mediations of recalled experience. 

Second, by examining how digitized resources are remediated when they are 
cataloged and indexed, noting how the cataloging or indexing criteria shape the 
ways resources are located in a collection as well as how these criteria enable 
or constrain what is findable within individual resources. Though this metadata 
and the rubrics through which it is searched are extrinsic to collections of pho- 
tographs and audio or video recordings, this information serves as the point of 
entry to the resources themselves, defining their value. 

Third, by examining the implications of remediation: for example, the tran- 
scription or translation of audio recordings, the addition of annotations, images, 
maps, etc., to provide context for the original resources. Consider what these 
additions reveal about expectations of who will use these resources and to what 
ends, as well as what new research possibilities these tools prompt. Moreover, 
consider how the remediation of so many resources on the internet enables 
relationships among them, linking collections within one online mega-archive, 
while also imposing on the delivery of these wide-ranging materials the common 
medium of a website. 

Fourth, by examining the means used to access these resources, including 
the possible venues for this engagement and how they can shape users' encoun- 
ters with materials. Consider, for example, the differences among viewing the 
same digitized photograph or listening to the same song recording at home, in a 
classroom, in a library, or in a museum. 

The decisions of how Holocaust memories are documented and how these 
documentations are remediated have important implications for their scholarly 
engagement. For example, communications scholar Amit Pinchevski argues that 
the turn to videography to document Holocaust survivor narratives, beginning 
in the 1970s, facilitated the advent of a scholarly *discourse of trauma and tes- 
timony." Pinchevski posits that the medium of videotape served as "the tech- 
nological unconscious," by enabling closer scrutiny of survivors' oral accounts 
than had previously been possible.?? In my own work on the VHA, its index and 
search mechanism enabled me to locate hundreds of instances in which survi- 
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vors discuss the role of Yiddish in their lives, from which I constructed a survey 
of the dynamics of Yiddish as a presence in Holocaust survivors’ lives over the 
course of the 20th century. The result is a composite account, constructed from 
segments of different survivors' narratives, none of whom provides such an over- 
view. Rather, survivors discuss Yiddish only on occasion, usually just once, in the 
course of relating their life histories. In contrast to the narrative constructed from 
the aggregation of these discussions, the limited, inconstant presence of Yiddish 
in these narratives is telling in its own right, demonstrating the linguistic disrup- 
tion in 20th-century European Jewish life that is, in turn, emblematic of great 
cultural, political, geographic, and demographic upheavals.”? 

The larger context of the digital turn is especially resonant for works of Hol- 
ocaust remembrance, as this moment coincides with the aging and passing away 
of survivors. Mounting concerns about how Holocaust education and remem- 
brance can proceed without survivors have prompted turns to these various 
mediations of their remembrances in order to extend survivors' presence, even 
striving to endow them with a kind of immortality.*° Yet the notion that digital 
media ensure stable preservation of survivors' memories is a false friend. Digital 
media are distinctly mutable, composed of long strings of binary code that can 
easily become corrupted. These media are dependent on rapidly changing tech- 
nologies, whether equipment that quickly becomes obsolete or online platforms 
that suddenly change or disappear without a trace. 

The turn to ever newer technologies in the digital age highlights the speed 
with which new media become old. Eventually, their technology, sensibility, or 
aesthetic will appear less than state-of-the-art — just as Boder's interviewing 
methodology can now sound outdated, or videos made in the last decades of the 
20th century are evidently of an earlier era, marked by survivors' clothing, speech, 
and home furnishings. At the same time, the cascade of innovations in digital 
media can also present new opportunities for documenting Holocaust memories, 
and not only to institutions, but also to individuals. Just as the first videotaped 
life histories of survivors were made by their families almost a half-century ago, 
similar videos are now recorded on cell phones and posted on social media plat- 
forms. With this practice, the memories of older generations are integrated into 
the posters' ongoing online self-portraiture. 

The tension between the desires to preserve memory and the evolving, shift- 
ing possibilities of digital media may seem challenging. However, this is emblem- 
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atic of memory generally, which is by its nature subjective, contingent, and rela- 
tional. Therefore, as we grapple with how to understand and to work with these 
new kinds of resources, we need to look back in time to consider the dynamics 
of their development, even as we look forward, as new technologies, research 
interests, and practices of engagement with digital resources continue to emerge. 

The field of Jewish Studies has examined the impact of earlier new mediums 
and media practices on Jewish life, starting with the advent of writing in rela- 
tion to oral language in ancient times and extending through print, photography, 
sound recordings, and broadcasting. Just as reflecting on digital media in Holo- 
caust memory practices redounds to issues of concern regarding these newest 
technologies in other fields of Jewish Studies, the digital turn in this field res- 
onates with ongoing considerations of how prior engagements with media that 
were once new have figured in Jewish life and how it has been studied.** 
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The Culture of the Very Rich and Very Poor: 
Do Digital Museum Collections Tell us 
Anything about Jewish Culture? 


Abstract: Digital approaches to Jewish Studies allow collecting data from mul- 
tiple sources. This enlarges the picture, makes our understanding of historical 
experiences more complete, and triggers further research questions. This chapter 
compares samples of artworks from digital online collections of the Metropolitan 
Museum of Art in New York and the Russian State Catalogue of Museum Collec- 
tions in the Russian Federation. We show that the difference in their temporal 
and geographical coverage of Jewish artworks/historical documents often results 
from tagging, naming and understanding of what is deemed Jewish. The collec- 
tive decisions made in these datasets, by many individuals, and differing institu- 
tions about cataloguing, indexing, and returning searches on Jewish culture in a 
digital age inform what parts of Jewish culture are accessible. The technological 
and data-led decisions become a part of multiple layers of decisions that inform 
how primary sources are formed in a digital age. 


Keywords: online museum collections, cultural heritage, indexing, cataloguing, 
bias, Metropolitan Museum of Art, Russian State Catalogue of Museum Collections 


1 Introduction 


Data-driven research in Jewish Studies has been used to analyze historical Hebrew 
newspapers,! manuscripts,? how Jewish content appears in cultural heritage aggre- 


1 O. Soffer et al., "Computational Analysis of Historical Hebrew Newspapers: Proof of Concept," 
Zutot — Perspectives on Jewish Culture 17 (2020): 97-110. 

2 M. Zhitomirsky-Geffet and Gila Prebor, “SageBook: Toward a Cross-Generational Social Net- 
work for the Jewish Sages' Prosopography," Digital Scholarship in the Humanities 34, no. 3 (Sep- 
tember 2019): 676-95; G. Prebor, Maayan Zhitomirsky-Geffet, and Yitzchak Miller, “A New An- 
alytic Framework for Prediction of Migration Patterns and Locations of Historical Manuscripts 
Based on Their Script Types," Digital Scholarship in the Humanities 35, no. 2 (June 2020): 441-58. 


3 Open Access. © 2022 Inna Kizhner et al., published by De Gruyter. OBASHE] This work is licensed under 
the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
https://doi.org/10.1515/9783110744828-003 
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gators such as Europeana,’ and to evidence the politics of digitization as revealed 
through textual collections.* However, analysis has not yet focused on how Jewish 
culture is represented within datasets of museum objects. The aim of this chapter is 
to present the results of an exploratory study to establish a methodology that lets us 
understand Jewish culture (or other minority cultures) as represented within digital 
collections in different parts of the world. The comparison of search results for the 
Metropolitan Museum of Art in the United States and those for the State Catalogue 
of Museum Collections of the Russian Federation will allow us to understand the 
effect that a variety of collecting and cataloging approaches may have on research 
at the crossroads of Digital Humanities and Jewish Studies. This will also help us 
to examine how the status of a minority population within different societies has 
affected the institutional and collections' response to their culture. In doing so, we 
can see a diversity of cataloging approaches for different cultural environments and 
the effects of data-driven analysis when studying minority cultures. 

Bias in cultural heritage data and representations of human knowledge in 
mass-digitized collections have been widely covered in recent literature? Such 
work also relates to epistemic complications of data collecting, data cleaning, 
and model training, and the difficulties imposed by “the infrastructures of 
knowledge-making.”’ Apart from selection and exclusion bias,’ these “infrastruc- 


3 Dov Winer, “Judaica Europeana: An Infrastructure for Aggregating Jewish Content,” Judaica 
Librarianship 18 (2014): 88-115. 

4 G. Zaagsma, “Digital History and the Politics of Digitization,” Paper presented at Digital Ar- 
chive and Canon Workshop, March 10, 2021, accessed May 11, 2021, https://www.digitales-ar- 
chiv-und-kanon.de/contributions/Zaagsma_en.pdf. 

5 See, for example, B.H. Daru et al., “Widespread Sampling Biases in Herbaria Revealed from 
Large-Scale Digitization,” New Phytologist 217 (2018): 939-55; N.B. Thylstrup, The Politics of Mass 
Digitization (Boston: MIT Press, 2019); K. Bode, “Why You Can’t Model Away Bias,” Modern Lan- 
guage Quarterly 81, no. 1 (2020): 95-124; A. Liu, “Toward a Diversity Stack: Digital Humanities 
and Diversity as Technical Problem,” PMLA/Publications of the Modern Language Association of 
America 135, no. 1 (2020): 130-51; S. Bagga and A. Piper, “Measuring the Effect of Bias in Training 
Data for Literary Classification,” Proceedings of LaTeCH-CLfL 2020, Barcelona, Spain, December 
12, 2020, 74-84 

6 A. Bechmann and G.C. Bowker, “Unsupervised by Any Other Name: Hidden Layers of Knowl- 
edge Production in Artificial Intelligence on Social Media," Big Data & Society 6, no. 1 (2019): 
1-11. 

7 B. Mak, "Archaeology of a Digitization," Journal of the Association for Information Science and 
Technology 65, no. 8 (2014): 1515-26, 1519, cited in Bode, “Why You Can't Model Away Bias," 2. 

8 Maria Economou, "Heritage in the Digital Age," in A Companion to Heritage Studies, ed. Wil- 
liam Logan, Máiréad Nic Craith, and Ullrich Kockel (Chichester, UK: Wiley, 2016), 215-28; Bode, 
*Why You Can't Model Away Bias"; T. Hauswedell et al., *Of Global Reach Yet of Situated Con- 
texts: An Examination of the Implicit and Explicit Selection Criteria that Shape Digital Archives 
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tures of knowledge-making" include misrepresentations caused by metadata 
conventions,’ cataloging approaches,'? classification principles," and linguistic 
issues." How things are named, classified and cataloged, and what contexts are 
selected to denote the meaning of artworks and documents strongly influences 
whether users are able to find minority cultures in the wealth of digitized cultural 
heritage we have. Perspectives and surrounding contexts may vary in different 
cultural environments as they tend to reflect the values of catalogers during a his- 
torical period. Only by understanding how cataloging principles and linguistic 
approaches influence the results of data retrieval, can we develop a methodology 
for studying minority cultures in contexts that combine big data approaches and 
epistemic dependence on the historical perceptions of data. We need to make 
explicit the principles that govern how information infrastructures shape our per- 
ceptions of Jewish culture and other cultures. We are always at risk of losing what 
we do not count and losing what we do not help users to see.” On the other hand, 
whatever classification instruments and cataloging principles we employ, some 
cultural objects tend to be always left behind. This happens because multiple 
“sieves” or instruments have different users in mind.“ This means that due to 
their coding cultures and functionality they tend to exclude some objects, and 
therefore potential users, along the way. These lost objects and associated cul- 
tural contexts may be important regarding minority cultures where the risk of 
losing contexts is especially high.” 


of Historical Newspapers," Archival Science 20 (2020): 139-65; Zaagsma, “Digital History and the 
Politics of Digitization." 

9 MV. Fernandez, “The Coloniality of Metadata: A Critical Data Analysis of the Archive of Early 
American Images at the John Carter Brown Library," PhD thesis, University of Texas, 2018; Zaag- 
sma, "Digital History and the Politics of Digitization." 

10 NJ. Bingham and H. Byrne. "Archival Strategies for Contemporary Collecting in a World of 
Big Data: Challenges and Opportunities with Curating the UK Web Archive," Big Data & Society, 
January 2021. 

11 G.C. Bowker and S.L. Star, Sorting Things Out: Classification and Its Consequences (Cam- 
bridge, MA: MIT Press, 1999); K. Cotter et al., “‘Reach the Right People’: The Politics of ‘Interests’ 
in Facebook’s Classification System for Ad Targeting," Big Data & Society, January 2021. 

12 J. Aguilera, “Another Word for ‘Illegal Alien’ at the Library of Congress: Contentious,” The 
New York Times, July 22, 2016, accessed April 25, 2021, https://www.nytimes.com/2016/07/23/ us/ 
another-word-for-illegal-alien-at-the-library-of-congress-contentious.html. 

13 G.C. Bowker, “Biodiversity Datadiversity,” Social Studies of Science 30, no. 5 (2000): 643-83. 
14 J. Likhter, personal communication, May 7, 2021. 

15 J. Likhter, personal communication, May 7, 2021. 
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2 Methods 


Our sources of data for analysis were the entirety of the online digital collections 
of the Metropolitan Museum of Art! in New York and the State Catalogue of the 
Museum Collections in the Russian Federation." Our choice of online digital 
collections from two different countries and different contexts in cultural herit- 
age was determined by our aim to see the difference (if any) in search results for 
“jew,” “jewish,” and related search terms in different settings. We sought to know 
whether and how the representation of objects reflected political, social, and 
epistemic attitudes in these different parts of the world, and between a leading 
Western institution, and a national catalog, to amplify difference in approach to 
test if this methodology would yield results. There is no national catalog in the 
USA against which to compare the Russian catalog, so the Metropolitan Museum 
of Art was chosen given it “collects, studies, conserves, and presents significant 
works of art across all times and cultures in order to connect people to creativity, 
knowledge, and ideas."!? Both collections detail a significant number of objects 
published online, are accessed by large numbers of users, and allow similar 
access to metadata, which facilitated comparison. The online digital collection 
of the Metropolitan Museum of Art included over 400,000 objects in 2017” and it 
attracted 8 million visitors in 2018.7? The Russian State Catalogue includes images 
for over 24 million museum objects at the time of writing, which makes it one of 
the largest national aggregators of cultural heritage across the world. It was built 
for inventory purposes, representing a third of Russian museum collections cov- 
ering almost all state museums across the country. This makes it a representative 
dataset to study Jewish culture in the Russian environment. The comparison of 
results for the Metropolitan Museum of Art and those for the Russian State Cat- 
alogue will allow us to examine how the status of a minority population within 
different societies and contexts has affected the institutional and collections' 
response to culture in societies with different political and collections traditions. 
We did not compare the results for the Russian State Catalogue with those from 


16 https://www.metmuseum.org/, accessed April 25, 2021. 

17 https://goskatalog.ru/portal/, accessed April 25, 2021. 

18 Metropolitan Museum of Art, "About the Met," 2021, accessed June 25, 2021, https://www. 
metmuseum.org/about-the-met. 

19 T. Navarette and E. Villaespesa, “Digital Heritage Consumption: The Case of the Metropolitan 
Museum of Art," magazén 1, no. 2 (December 2020). 

20 Metropolitan Museum of Art, “Met Museum Sets New Attendance Record with More Than 7.35 
Million Visitors," 2018, accessed May 12, 2021, https://www.metmuseum.org/press/news/2018/ 
met-museum-sets- new-attendance-record. 
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a large international aggregator of cultural heritage images, such as Google Arts 
and Culture or Europeana Collections. This is because we wanted to focus on the 
epistemic path dependence, and the political and social attitudes that national 
and institutional digital infrastructures maintain in producing knowledge. The 
content of aggregators can be studied at a later date so that we see a combined 
approach to representations of Jewish culture or other minority cultures. 

We used the collections' search engines for museum objects matching “jew,” 
“jewish,” “hebrew,” and “yiddish” search terms. Russian search terms were “eBpeni” 
as a lemma because the Catalogue's search results can return results for parts of 
words and this is important for a language with numerous word forms, such as 
Russian. Other search terms were “uBpuT” for “hebrew,” *wranr" for “Yiddish,” and 
*npesneenpeii (ckuit)" for “Old Jewish,” which is an equivalent to “hebrew” for reli- 
gious, biblical, or literary contexts. We did not use the lemma “nygeñ” or “Judaic” as 
opposed to “Christian,” although this search term returned over 500 results for the 
Russian State Catalogue and two results for the Metropolitan Museum of Art's online 
digital collection. Neither did we use results for “Judaism,” although this search 
term returned 200 results for the Metropolitan Museum of Art and 24 results for the 
Russian State Catalogue. 

As of October 2020, our search yielded approximately 900 results for the 
Metropolitan Museum of Art's digital collection and about 6,300 results for the 
Russian State Catalogue (see Tables 1 and 2 for a detailed breakdown of results). 
To get the results that indeed influence the perception of Jewish culture, we used 
only the “artworks with images” option on the websites. We used a sample (10%) 
of the results from the Russian State Catalogue which we made more represent- 
ative by conducting a random search in the 12 categories that were offered by 
the Catalogue's interface, such as "paintings," "sculpture," “graphics,” and “rare 
books." After filtering out duplicates and irrelevant results, we were left with 407 
records for the Metropolitan Museum of Art, while the sample from the Russian 
State Catalogue yielded 404 records after excluding duplicates. We tabulated the 
metadata and data from textual descriptions for the results of the search. The 
fields included the name and culture of artists, types of objects, dates or time 
periods, and geographical descriptions. We also included data on whether the 
artwork was or had been a part of an exhibition in the Jewish Museum in New 
York or any other results for an exhibition that contextualized search results. This 
could only be done for the Metropolitan Museum of Art as the Russian State Cat- 
alogue does not record this information. We preferred to include the geography 
that was not directly related to the place where an artwork was produced because 
it enabled us to contextualize the Jewish culture showing the breadth and depth 
of linkages. The textual data for the Metropolitan Museum of Art were a com- 
bination of metadata, extensive historical contexts revealed in textual descrip- 
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tions, and exhibition histories that placed an artwork in a variety of cultures 
(Figure 1). The information for the Russian State Catalogue was obtained from 
the titles (which were sometimes quite extensive), the date, and the place where 
an artwork was produced (Figure 2). In this way, the interface of the two digital 
collections structured, directed, and limited our results (see Section 3 for anal- 
ysis). Biographical details for the artists whose artworks were returned follow- 
ing our search in the Metropolitan Museum of Art online digital collection were 
obtained from Wikipedia. The maps showing the distribution of search results 
and their historical contexts across space were produced using Adobe Illustrator. 
We obtained contemporary maps through Yandex Maps service.” Maps of histori- 
cal places, such as Mesopotamia, were more difficult to produce due to the uncer- 
tainty about their borders. Uncertain borders, derived from popular sources, such 
as Wikipedia, are shown as blurred lines to reflect this. 


Table 1: The number of records and metadata returned by the search engine of the Metropolitan 
Museum of Art’s online digital collection when using “jewish,” “jew,” “hebrew,” and "yiddish" 
as search terms in autumn 2020. 


Search terms Number of records with metadata and images 
returned by the search engine 


“jewish” 496 
“jew” 210 
“hebrew” 194 
*yiddish" 4 


Table 2: The number of records and metadata returned by the search engine of the Russian 
State Catalogue's online digital collection when using “jewish,” “jew,” “hebrew,” “yiddish,” 
and "old jewish" as search terms in autumn 2020. 


Search terms Number of records with metadata and images 
returned by the search engine 


"eppeii" as a part of the word for “jew” 4,687 
and “jewish” 

“nguw” for “yiddish” 1,510 
“uBput” for “hebrew” 96 
*npeBHeeBpeil(ckuit)" for “old jewish” 33 


21 Accessed April 25, 2021, https://yandex.ru/maps/?11-92.852572962C33.461352&z 2. 
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The Collection | European Sculpture sed Decorative Arts 


Pair of Torah finials 


(rimonim) 
ca. 1740-50 


Exceptional for their size and precious material, these 
Torah finials are rare survivors of eighteenth-contury 
Rakan silver and a testimony to the artistic virtuosity of 
goldsmithing in Venice. In synagogues. the scrot of the 
Torah, the first five books of the Hebrew Bible. is often 
decorated with a set of sõvor cenaments including a 
crown. o shield, and finials mounted on the staves. 
Symbolizing the bond between community and faith, — 

these are decorated with religious emblems tie priestly fi] ese oman 44€ 


Om maNts, a miniature tempie, a menorah. and the 


Figure 1: An example of a record returned by the search engine of the Metropolitan Museum of 
Art's online digital collection when using *hebrew" as a search term. Images, metadata, and a 
textual description showing a pair of Torah finials. The Metropolitan Museum of Art, New York. 
Walter and Leonore Annenberg Acquisitions Endowment Fund, 2016. 
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Figure 2: An example of a record returned by the search engine of the Russian State Catalogue's 
online digital collection when using “old jewish" as a search term. An image with metadata 
showing Wilhelm Schickard's book with rules for Old Jewish language. Vologda State Museum 
of History, Architecture and Art. 
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3 Results 
3.1 Geographical Distribution of Artworks 


Figure 3 shows the geographical distribution of artworks in our sample for the 
search conducted on the Metropolitan Museum of Art website. We can see a wide 
geographical coverage. The artworks related to the USA dominate in the sample 
but we can also see artworks related to Europe, Russia, Egypt, Israel, Northern 
Africa, and Mesopotamia. Figure 4 shows the results for the geographical dis- 
tribution in our sample for the search conducted on the website of the Russian 
State Catalogue. The artworks in the sample are almost exclusively produced in, 
published in, or related to places within Russia or countries that used to be a 
part of the Russian Empire or the Soviet Union. This representation is unrelated 
or weakly related to either historical or contemporary cultures but rather shows 
Jews as a separate ethnicity in Russian society. Conversely, in the representation 
of the Metropolitan Museum of Art, Jewish culture is embedded in its relations 
with neighboring countries and cultures. What is more important, the Metropol- 
itan Museum of Art's online digital collection represents Jewish culture as a part 
of the Western canon? through its relation to biblical geography and the nations 
that interacted and influenced Jewish culture from the foundations of their 
society. This evidence supports a wide historical span and is augmented with the 
evidence on 20th century artists closer to the newer end of the canon (Figure 5). 
The results for the distribution of the Metropolitan Museum of Art artworks 
differ dramatically from the results of the Russian State Catalogue. This happens 
because contexts for “Jewish” and “Hebrew” differ in the two online digital col- 
lections and the words that are used as search terms, as a result, often mean dif- 
ferent things. This is connected with different histories of Jews, different cultural 
environments, different attitudes to reconstructing histories, different cataloging 
approaches and different meanings of the search terms in the two countries. To a 
large extent, the difference is determined by what is deemed Jewish by catalogers 
and institutions in the two countries. It is also explained by the structure of the 
interfaces of the two digital collections and by how (whether) the artworks are 
contextualized through exhibitions or titles (see Section 2). 


22 N. Frye, The Great Code: The Bible and Literature (San Diego: Harcourt Brace Jovanovich, 
1982); H. Bloom, The Western Canon (New York: Harcourt Brace, 1994). 
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3.2 Distribution of Artworks Across Time 


Figure 5 shows the difference in the distribution of artworks across time for the 
two samples. It demonstrates a pronounced peak in early 20th-century artworks 
from the Metropolitan Museum of Art, and an even distribution going back to 
the fourth millennium BC. Combined with a significant number of avant-garde 
Jewish artists in this dataset (see Section 3.3), the results show a span across 
time and space that ranges from the “exemplary ancient" to “model moderns."? 
This happens because the Metropolitan Museum of Art has a metadata field 
that mentions all the places where artworks were exhibited. If such a place has 
“jewish” in its title, such as the Jewish Museum in New York, the artwork will be 
returned as a search result by the Metropolitan Museum of Art search engine. As 
a result, artworks are linked to the exhibitions at the Jewish Museum in New York 
in the period between 1956 and 2018,” and the Biblical Archaeological Exhibi- 
tion at the University of Wisconsin-Madison Department of Hebrew and Semitic 
Studies (1975).”° The links to the “Colmar Treasure" exhibition at the Metropolitan 
Museum of Art in 2019?6 show the possessions of a Jewish community in medie- 
val Germany. The results linked to such exhibitions relate to 157 search results or 
about 4096 of the dataset. Tagging systems and classification approaches mean 
that artworks from Assyrian, Babylonian, Sumerian, medieval, and contemporary 
cultures are included in the artworks returned from our queries and are therefore 
present in the dataset and analysis. Conversely, the Russian State Catalogue does 
not have this functionality and, consequently, the temporal and geographical 
spans of the sample are much narrower due to a lack of information and context. 
Figure 5 shows that artworks related to Jewish identity in the Russian State Cata- 
logue date back to the 17th century and have little relation to biblical contexts or 
to the part of the European cultural canon that is associated with the history of 
Mesopotamia and ancient civilizations. The trough in the temporal distribution in 
the 1940s-1960s in the artworks from the Russian State Catalogue, which is most 


23 Yuri Slezkine, The Jewish Century (Princeton, NJ: Princeton University Press, 2002). 

24 See, for example, Jewish Museum, New York, "Russian Jewish Artists in a Century of Change, 
1890-1990," September 21, 1995-January 28, 1996. 

25 University of Wisconsin-Madison Department of Hebrew and Semitic Studies, “The Book and 
the Spade" (Biblical Archaeological Exhibition), April 13-May 4, 1975. 

26 Metropolitan Museum of Art, “The Colmar Treasure: A Medieval Jewish Legacy," July 22, 
2019-January 12, 2020, accessed April 30, 2021, https://www.metmuseum.org/exhibitions/list- 
ings/2019/colmar-treasure-medieval-jewish-legacy. 
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Figure 5: Temporal distribution of collections related to Jewish Studies in the Metropolitan 
Museum of Art and the Russian State Catalogue. Graph: Daniil Skorinkin, Inna Kizhner, Melissa 
Terras, Julia Afanasieva, Diana Pusenkova, and Maria Sherer. 


pronounced directly after World War II, is explained by the famous anti-Jewish 
campaigns in the Soviet Union at the end of the 1940s.” 


3.3 Distribution of Artworks by Topics and Types of Art 


As shown in Figure 6, Jewish identity in the sample of search results returned 
by the Metropolitan Museum of Art's search engine is linked to contemporary 
artists, biblical contexts, and Mesopotamia (ancient civilizations). The percep- 
tion of “Jewishness” is broadened by the inclusion of relations between Jews and 
Muslims or Sephardic Jews. On the other hand, Figure 6 shows that Jewish iden- 
tityin the sample of search results returned by the Russian State Catalogue relates 
to high Yiddish culture and is linked to Jewish (Yiddish) theater, literature, and 
music. This is because the Russian State Catalogue returns numerous results for 
the Jewish State Theatre (over 30% of the search results for “Jewish” as a search 
term - over 1,500 records - where plays were performed in Yiddish). Other results 
from the sample are linked to books translated from (into) Yiddish and musical 
scores where texts were written in Yiddish. 


27 See, for example, B. Pinkus, The Jews of the Soviet Union: The History of a National Minority 
(Cambridge: Cambridge University Press, 1988). 
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Figure 7 shows the distribution of records by type of art for the most frequent 
types in the Metropolitan Museum of Art and Russian State Catalogue. We can see 
the prevalence of books for the Russian State Catalogue and the dominance of 
paintings for the Metropolitan Museum of Art. This cannot be explained by the fact 
that the Russian museums and, consequently, the Russian State Catalogue collect 
books rather than paintings. Indeed, paintings exceed books in the collection of 
the Metropolitan Museum of Art but there are twice as many paintings as books 
in its online digital collection.?? However, the dominance of paintings over books 
from this online digital collection is much more pronounced in the artworks related 
to Jewish identity in our sample. The share of books related to Jewish artworks in 
the sample of search results from the Russian State Catalogue is much greater than 
the share of all books in the Russian State Catalogue.?? This happens because texts 
accompanying images in the State Catalogue are descriptive and rarely contextu- 
alize artworks, while books come with extensive titles, quite often with the note 
“translated from Jewish (Yiddish).” It means that biblical paintings come without 
an explanation of who is depicted and what biblical story inspired the artwork. In 
addition, in the Russian State Catalogue textual descriptions that are not a part 
of the title are not included as a field that is used for retrieval. It means that even 
if Judith, the Jewish heroine, is presented as such in the textual description, the 
image and metadata will not be returned as search results for the terms “Jewish” 
or “Hebrew.” Conversely, the Metropolitan Museum of Art's search engine retrieves 
artworks if the search term is mentioned in any of the metadata fields displayed 
to the user. This explains why "paintings" is the most frequent category for the 
topical distribution of search results for the Metropolitan Museum of Art. A third of 
the paintings from the Metropolitan Museum of Art in our sample involve biblical 
and Christian themes and contexts. Two-thirds of the 93 paintings in the sample 
are 20th-century artworks, and half of them are by American Jewish artists, such 
as Max Weber (1881-1961). A quarter of them are by French Jewish artists, such 
as Chaim Soutine (1893-1943) or Mark Chagall (1887-1985). On the other hand, 
famous contemporary artists in the Russian State Catalogue are rarely retrieved 
as Jewish. While the Metropolitan Museum of Art's search engine retrieves 11 of 
Chagall’s paintings tagged as Jewish in data annotations (19% of 59 Chagall works 
in the online digital collection), the Russian State Catalogue's search engine pro- 
duces four works tagged as Jewish or 2% of 185 records related to Chagall in this 
online digital collection). In the search results for the Metropolitan Museum of Art, 


28 Metropolitan Museum of Art, “Explore the Collection," 2021, accessed June 25, 2021, https:// 
www.metmuseum.org/art/collection. 

29 Russian State Catalogue (The State Catalogue of Museum Collections of the Russian Federa- 
tion), 2021, accessed June 25, 2021, https://goskatalog.ru/portal/#/. 
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a significant proportion of contemporary paintings (42 records out of 57 artworks, 
or about 70% of the sample) was produced by artists who were born in the Russian 
Empire, such as Mark Chagall or Max Weber. This group also includes artists whose 
parents emigrated from the Russian Empire at the end of the 19th and the begin- 
ning ofthe 20th century. The textual descriptions for these records could have men- 
tioned Yiddish, since these painters were from families where the Yiddish language 
and culture were a part of their life and a significant source of influence.?? However, 
users can rarely retrieve the word Yiddish in the curatorial texts and data annota- 
tions of the Metropolitan Museum of Art's online digital collection. The *Yiddish" 
search term returns four artworks or about 196 of all search results for the sample of 
search results for the Metropolitan Museum of Art. This means that until recently 
Yiddish meant “wide association with marginality, mutability, or obsolescence,”** 
and the museum community preferred to focus on other representations of Jewish 
culture. Conversely, searching for “Yiddish” in the Russian State Catalogue pro- 
duces 1,510 results or 24% of results for the three search terms. However, nearly 
all the results (95% or 1,414 objects out of 1,510 results for this search term) are 
retrieved from the Regional Museum of the Jewish Autonomy in the Far East, reduc- 
ing Yiddish culture to the idea of Jewish territorialism.?? 


3.4 Linguistic Differences Related to Search Results 


One of the results reflecting the difference in linguistic approaches is the use 
of “Old Jewish" (npesHeegpericknni) to write about “Hebrew” and the use of 
“Jewish” (eBpeňckuň) to write about “Yiddish” in the 20th century in the results 
from the Russian State Catalogue. This observation agrees with literature showing 
that “Jewish” was used in Russian/Soviet publication standards to denote “Yid- 
dish"? and with literature on the eradication of Hebrew publishing in the 1920s in 


30 See, for example, B. Harshav and M. Chagall, Marc Chagall and His Times: A Documentary Nar- 
rative (Palo Alto, CA: Stanford University Press, 2004); J. Shandler, Yiddish: Biography of a Language 
(Oxford: Oxford University Press, 2020). 

31 J. Shandler, Adventures in Yiddishland: Postvernacular Language and Culture (Berkeley: Uni- 
versity of California Press, 2006). 

32 Ellen Eisenberg, Jewish Agricultural Colonies in New Jersey, 1882-1920 (Syracuse, NY: Syracuse 
University Press, 1995). 

33 L. Kogan and S. Loesov, “Old Jewish Language,” in World Languages: Semitic Languages. 
The Akkadian Language. Northwest Semitic Languages, ed. A. Belova, L. Kogan, S. Loesov, and 
O. Romanova. Russian Academy of Sciences. Institute of Linguistics (Moscow: Academia, 2009), 
296-375 [Russian]. 
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Figure 6: Distribution of records by topics for the most frequent types in the Metropolitan 
Museum of Art and the Russian State Catalogue. Graph: Julia Afanasieva, Inna Kizhner, 
Melissa Terras, Diana Pusenkova, Maria Sherer, and Daniil Skorinkin. 
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Figure 7: Distribution of records by type of art for the most frequent types in the 
Metropolitan Museum of Art and the Russian State Catalogue. Graph: Julia Afanasieva, 
Inna Kizhner, Melissa Terras, Diana Pusenkova, Maria Sherer, and Daniil Skorinkin. 
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Russia.** The latest item described with the expression “translated into Old Jewish" 
inthe Russian State Catalogue is a book of Chekhov's stories published in 1918. The 
author of the earliest book in our dataset that uses “Jewish,” meaning “Yiddish,” 
includes a caveat, “written in Jewish-German dialect." The book was published 
in 1913. This means that the search for "Jewish" for the results dating back to the 
20th century from the Russian State Catalogue will almost always return results 
that mean "Yiddish," and they do not include biblical contexts, religious contexts, 
and the contexts of ancient civilizations. Such confusion of words and terminology 
related to the circumstances and environments of Jewish culture definitely influ- 
enced what was deemed Jewish, and what aspects of Jewish life were emphasized. 

Different contexts in which the results of the search are displayed imply that 
they reflect different attitudes and, probably, different understandings of the Jewish 
identity in the two digital collections. The contexts in the Russian State Catalogue 
are very much about the Yiddish speaking population of the Russian Empire and 
the Soviet Union, using “the language of the Jewish street,” while the contexts 
of the Metropolitan Museum of Art are very much about Jewish identity as a part 
of the European cultural canon and post-colonial perception of Jews and Muslims 
or Jews in Northern Africa, such as in the exhibition “The Cairo Geniza: Jews & 
Muslims in the Mediterranean World 800-1500” about Jews in Egypt.” However, 
sometimes the results related to this part of Jewish identity are mixed with the 
representation of colonialism and orientalism.? In this sense, the results from the 
Metropolitan Museum of Art are in line with the results from the Russian State Cat- 
alogue displaying colonial photography from Georgia, Dagestan, and Uzbekistan 
at the turn of the 20th century. 

Our results show different ways of representing Jewish identity in metadata 
and collection management systems. In this way, Russian cataloging approaches 
or cataloging voices show differences in cultural or social attitudes that date back 
to the very first years after the Russian Revolution. They are very much about 
the formal policy and representation of Jewishness through the Yiddish language 
and literature. In the case of the Metropolitan Museum of Art, Jewish identity is 
given a broader context that encompasses biblical culture, ancient civilizations, 
medieval European culture, Northern Africa, and an age of modernity showing 
contemporary Jewish art in Europe and the United States (mostly in New York). 


34 A. Blyum, *Hebrew Publications and the Soviet Censor in the 1920s," East European Jewish 
Affairs 23, no. 1 (1993): 91-99. 

35 Blyum, “Hebrew Publications and the Soviet Censor in the 1920s," 98. 

36 Jewish Museum, New York. *The Cairo Geniza: Jews and Muslims in the Mediterranean World 
800-1500." January 1, 1997-October 12, 1997. 

37 Edward W. Said, Orientalism (New York: Random House, 1978). 
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4 Conclusion 


This chapter showed that comparing metadata and catalog searches in this manner 
provides a methodology that lets us understand the representation of Jewish 
culture (or other minority cultures) by digital collections in different parts of the 
world, and in different contexts. Our results demonstrate that digital cultural her- 
itage searches are complicated by a variety of cataloging voices and classification 
principles. The evidence that supports the argument of this chapter demonstrates 
how the layers of debates on “what is Jewish?" intertwine through collecting, 
tagging, linking, and publishing artworks online in very different contexts. Gen- 
eralizations that can follow such analysis at scale vary widely, depending on 
political, cultural, and social perspectives. These perspectives are reflected in 
tagging principles, word usage, and classification approaches. What constitutes 
datasets under analysis, what texts, links and associating concepts are involved 
in producing contexts, determines our conclusions. Generalizations that emerge 
as a result produce contrasting, absolutely different representations of objects 
under study as shown in this chapter. One cannot but agree with Andrew Piper's 
argument about our "failure to generalize well" and about the crisis of reproduc- 
ibility in the humanities based on “case-driven research.” Our chapter argues, 
however, that even analysis at scale does not contribute to better generalizations. 
To a certain extent, this happens because of a variety of cataloging conventions. 
Of course, the difference in results between the two chosen sources can also be 
assigned to different collections practices, since this exploratory research is com- 
paring the curated collection of a large institution (the Metropolitan Museum of 
Art) with the portal to the collections of an entire nation (the State Catalogue of 
Museum Collections of the Russian Federation). Further work is needed on col- 
lections practices and processes, and how these approaches may have affected 
individual or national collections at different points of time, and we stress that 
the choice of which collections to compare is a crucial part of this design meth- 
odology. 

The existence of multiple views and multiple approaches to museum anno- 
tation is not in itself a problem if our aim is to show that the truth is constituted 
by numerous contexts and does not depend on singular political and historical 
circumstances of building a collection. What is important, however, is making 
these circumstances and their influence explicit. In doing so we can demonstrate 


38 A. Piper, Can We Be Wrong? The Problem of Textual Evidence in a Time of Data (Cambridge: 
Cambridge University Press, 2020), 4—5. 
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how data models at the foundation of information systems depend on “ethical 
and political values, modulated by local administrative procedures."?? 

One of the ways of avoiding the complications of subjective interpretations 
and consequent difference in data models is reducing classifications to a list of 
categories from pre-developed ontologies as it is done in sciences. However, liter- 
ature on biological databases in recent years has been devoted to the importance 
of contexts, the relevance of time and space, the historiography (provenance) of 
data diversity,^? and semantic relations of synonyms and overlapping taxonomic 
classes.^' What holds for biological data relates to social and cultural data where 
the contexts of data production are highly relevant. The complexity of metadata 
standardization in this case is augmented by the necessity to have standards 
reflecting a variety of types of data content. While Iconclass,? for example, gives 
a wide range of classes related to Jewish culture, they are mostly about religion, 
rituals, and religious institutions.*? Participation in politics, Jewish welfare, the 
histories of Jewish territorialism, lay Yiddish and Hebrew culture, literature, and 
theater are difficult to catalog using the Iconclass perspective. In addition, the 
contexts of data provenance (how, when, and where data were produced) influ- 
ence cataloging approaches and, consequently, affect the compatibility of data 
produced under different conditions. The layers of intertwined perspectives 
demonstrated in this chapter show how much work is to be done to achieve the 
compatibility of digital infrastructures for minority cultures. 

What we have here is not only data produced in different spatial and tem- 
poral contexts. These are also data produced in cultural environments that were 
very different from (and sometimes hostile to) those of a minority culture. Such 
dispersal of attitudes and perspectives does not make easier the work of stand- 
ardizing the approaches to cataloging Jewish culture. Future work may concen- 
trate on how to approach the "overly optimistic" task of standardizing data struc- 
tures and formats“ for minority cultures. A possible problem here is how to select 
features to build classes if the size of a sample is too small or if a sample is not 


39 Bowker and Star, Sorting Things Out, 321. 

40 See, for example, Bowker, “Biodiversity Datadiversity"; L.M. Schriml et al., “COVID-19 Pan- 
demic Reveals the Peril of Ignoring Metadata Standards," Scientific Data 7 (2020): article 188. 
41 Beckett W. Sterner, Nico M. Franz, and J. Witteveen, *Coordinating Dissent as an Alternative 
to Consensus Classification: Insights from Systematics for Bio-ontologies," History and Philoso- 
phy of the Life Sciences 42, no. 1 (2020): article 8. 

42 Iconclass, 2009, accessed April 25, 2021, http://www.iconclass.nl/home. 

43 Accessed April 25, 2021, http://www.iconclass.org/rkd/1/?q=jewish&q_s=1; accessed April 25, 
2021, http://www.iconclass.org/rkd/12A/. 

44 Bowker, “Biodiversity Datadiversity,” 661n4, cited in Claire Waterton, “From Field to Fantasy: 
Classifying Nature, Constructing Europe," Social Studies of Science 32 (2002): 177-204 


The Culture of the Very Rich and Very Poor — 61 


representative enough.” Possible directions for complicated tasks of building 
classifications for the analysis of minority cultures is an exciting research area 
that can provide foundations for finding, sustaining, and disseminating cultures 
in today's world of statistical dominance and historical generalizations that priv- 
ilege empires rather than local data.”° This, however, does not solve the problem 
of the authoritative voices behind these classifications and the issues of applying 
standard classifications by curators educated within a certain school of thought 
or research community. Principles that will guide data models will rely on ethical 
norms, educational institutions, academic research, reading lists, and users to 
whom cataloging voices will be addressed. These agents will determine the data 
models and the accessibility of cultures we will see in the near future. 

What is clear from this research is that the technological and data-led deci- 
sions made about cataloging, indexing, and returning searches on Jewish culture 
in a digital age inform what parts of Jewish culture are accessible in a digital age. 
The choices made in these platforms and data structures form online Jewish 
historical culture, as reflected in, supported by, and delivered from galleries, 
libraries, archives, and museums. The collective decisions made in these data- 
sets, by many individuals, and differing institutions, often over long timeframes, 
will affect how users can find and navigate Jewish culture for years to come. We 
therefore suggest that the data structures which underpin content management 
systems are worthy of future study, comparison, and critique, when understand- 
ing how minority cultures are delivered to online users of cultural heritage. 


45 L.I. Kuncheva et al., “Feature Selection from High-Dimensional Data with Very Low Sample Size: 
A Cautionary Tale," August 28, 2020, accessed May 12, 2021, https://arxiv.org/pdf/2008.12025.pdf. 
46 G.C. Bowker, Foreword, in All Data Are Local: Thinking Critically in a Data-Driven Society, ed. 
Yanni Alexander Loukissas (Cambridge, MA: MIT Press, 2019); Yanni Alexander Loukissas, All 
Data Are Local: Thinking Critically in a Data-Driven Society (Cambridge, MA: MIT Press, 2019). 
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Jakub Mlynář, Jiří Kocián, and Karin Hofmeisterová 

How “Tools” Produce “Data”: Searching 
in a Large Digital Corpus of Audiovisual 
Holocaust Testimonies 


Abstract: The field of Jewish Studies is facing many new challenges as a result 
of ongoing digitization. This chapter focuses on digital oral histories of the Hol- 
ocaust. Following the digital revolution in oral history, many institutions now 
provide access to multiple collections at once. One of the new challenges is thus 
related to the simultaneous availability of several archives, as well as various 
search engines which apply different methods to browse their content. The aim of 
this chapter is to identify and describe participants' practices for working with a 
large corpus of audiovisual Holocaust testimonies, especially in terms of locating 
relevant results within the collection by using three different search systems. We 
have conducted an empirical study in an experimental setting designed to emulate 
work with various search engines. Three pairs of novice users solved ten tasks 
over video-conferencing software, utilizing three different search "tools" (USC 
Shoah Foundation's Visual History Archive, Amalach, and Pixla). Our main find- 
ings consist of formulating a fundamental structure and elements of participants’ 
collaborative work, composed of three complementary actions: testing, sharing, 
and implementing. Furthermore, users obtained the search results by two main 
approaches: aggregation and query refinement. Interestingly, they did not upgrade 
the searching skills progressively, but rather used the current *best knowledge" for 
all the tasks and search engines at once. The participants’ emergent competence 
was continuously developed on the basis of collaborative work with the search 
engines and the results obtained so far through their work on the previous tasks. 


Keywords: Holocaust testimonies, database searching, digital ethnography, oral 
history, social interaction, video analysis 
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1 Introduction 


Research in the field of Jewish Studies is facing a number of new challenges as a 
result of ongoing digitization.! In this chapter, we focus on the specific domain 
of digital oral histories of the Holocaust.” Following the digital revolution in oral 
history research? many institutions now provide access to several divergent col- 
lections at once. One of the new challenges is thus related to the simultaneous 
availability of multiple archives, as well as various search engines which apply 
different methods to browse their content. In this context, our chapter aims at pro- 
viding a methodological and epistemological reflection of the common approach 
to the qualitative research praxis. This approach consists of using search tools 
to obtain data that respond to predefined research questions. Nonetheless, in 
our chapter, we aim to explore how “tools” create “data,” how these two notions 
intertwine in the practical organization of "search" in large digital corpora of 
audiovisual materials, and how these issues might project onto research design 
and the formulation of research questions. 

Since the onset of the digitization wave at the turn of the millennium, which 
engulfed archival sources of various kinds, creators of digital collection systems 
and their respective user interfaces were posing questions on how the digital turn 
is reflected in the interaction between users and sources. Search for answers is 
mostly conducted in the methodological and conceptual domain of user studies, 
a subfield of human-computer interaction research, which takes into account 
aspects and variables highly relevant for our research as well, such as the diver- 
sity of users, their level of expertise, search tools at hand, terminology represent- 
ing data and tools, and many more.^ Audio and video recordings of interaction 
sequences have been a fundamental method for obtaining relevant data for user 


1 We would like to thank the reviewers and the editors for their thoughtful remarks and sugges- 
tions, as well as the audience at the online conference event for their inspiring comments and ad- 
vice. This text was written with the support of the Ministry of Education, Youth and Sports of the 
Czech Republic, Project No. LM2018101 LINDAT/CLARIAH-CZ, and Charles University Research 
Centre No. 9 (UNCE VITRI). 

2 See, e.g., Cord Pagenstecher, “Testimonies in Digital Environments: Comparing and (De-)Con- 
textualising Interviews with Holocaust Survivor Anita Lasker-Wallfisch," Oral History 46, no. 2 
(Autumn 2018): 109-18, accessed February 10, 2021, http://www.jstor.org/stable/44993579; Vic- 
toria Grace Walden, “What Is ‘Virtual Holocaust Memory’?,” Memory Studies, November 2019, 
doi:10.1177/1750698019888712. 

3 Alistair Thomson, “Four Paradigm Transformations in Oral History," The Oral History Review 
34, no. 1 (2007): 49-70, accessed February 10, 2021, http://www.jstor.org/stable/4495417. 

4 Wendy M. Duff, “User Studies in Archives," in User Studies for Digital Library Development, ed. 
Pierluigi Feliciati, Andy O'Dwyer, and Milena Dobreva (London: Facet Publishing, 2012), 199—207. 
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analysis and in this sense, our chapter can be considered closely related to this 
field of study as it is informed by the same type of source material.’ Nevertheless, 
unlike the more typical approach of user studies, our chapter neither puts prev- 
alent emphasis on the technological dimension of this issue, nor does the oppo- 
site — observe the achieved results through an epistemological prism of research in 
history, oral history, or social sciences. It rather positions itself at an intersection 
of both domains and seeks to contribute by approaching the topics in question pri- 
marily as situated social practices. In the analysis, we reach our research aims by 
simultaneously confronting novice user pairs with a set of tasks archetypical for 
the field of digital oral history, and draw on their collaborative work with multiple 
tools while focusing on the interactional process of reaching the solutions.* 

This text draws largely from our experience gained as the staff of the Malach 
Centre for Visual History (CVHM) at the Charles University in Prague." Over the last 
decade, CVHM has been providing access for students, researchers, and the general 
public to several established collections of oral history interviews. Since 2009, 
CVHM has been an access point to the University of Southern California Shoah 
Foundation’s Visual History Archive (VHA), which is an ever-growing collection of 
interviews with witnesses and survivors of genocides, especially the Holocaust. At 
the present moment, the VHA contains almost 56,000 audiovisual recordings of 
oral history interviews in more than 40 languages. Since 2018, the Fortunoff Video 
Archive for Holocaust Testimonies of the Yale University Library with more than 
4,400 audiovisual recordings of oral history interviews is also available at CVHM. 
In addition, users in CVHM can work with smaller collections lacking an inte- 
grated user interface such as the Refugee Voices archive (150 English interviews), 
anda small portion of interviews from the Melbourne Holocaust Museum, formerly 
known as the Jewish Holocaust Center in Melbourne (15 interviews with people of 
Czechoslovak origin). One of our tasks as employees of the CVHM is therefore to 


5 For instance Joyce C. Chapman, “Observing Users: An Empirical Analysis of User Interaction 
with Online Finding Aids,” Journal of Archival Organization 8, no. 1 (2010): 4-30, accessed Febru- 
ary 10, 2021, doi:10.1080/15332748.2010.484361. 

6 Numerous studies present a structurally similar design, but their main interest relies in large part 
on the evaluation of finding correct solutions. Instead, the process of the solution method negotia- 
tion itself is utterly central to us. For instance: Sadegh Kharazmi, Sarvnaz Karimi, Falk Scholer, and 
Adam Clark, “A Study of Querying Behaviour of Expert and Non-expert Users of Biomedical Search 
Systems," in Proceedings of the 2014 Australasian Document Computing Symposium (ADCS ’14), As- 
sociation for Computing Machinery, New York, NY, USA, 10-17, doi:10.1145/2682862.2682871. 

7 See Jakub Mlynář, “Malach Center for Visual History,” in Sbornik Semináře o digitálních zdro- 
jich a službách ve společenských a humanitních vědách (WDH 2015), ed. Jaroslava Hlaváčová 
(Prague: Charles University, 2015), 83-89; Jiří Kocián, Jakub Mlynář, and Petra Hoffmannová, 
eds., Malach Center for Visual History on Its 10th Anniversary (Prague: Matfyzpress, 2020). 
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assist and advise researchers in their pursuit of audiovisual materials relevant to 
their interests. 

In this chapter, we examine some characteristic problems that emerge during 
work with large digital archives by focusing on the example of the Czech-language 
subsection of the VHA. We first provide a background and rationale for our efforts 
(in Section 2), problematizing the common-sense link between the “tools” and the 
“data.” We then move in Section 3 to the description of an experiment which was con- 
ducted to make visible some of the users' intrinsic practices in working with the data- 
base systems available at CVHM. Analysis of the video recorded experimental ses- 
sions yielded several main findings, which we present in Section 4. In conclusion, we 
discuss the findings in a broader context and address the question of how they relate 
to the use of digital oral history resources such as the VHA in Holocaust research. 


2 Background and Rationale 


Rather frequently, current research praxis in digital environments is conceived 
in terms of using “tools” upon “data.” For example, researchers use search 
systems (“tools”) that allow them to identify relevant units in a corpus of materi- 
als (“data”). In the case of the materials available at CVHM, incoming researchers 
as "our users" ultimately expect to watch interviews (or segments of interviews) 
that are related to "their research topics." In this sense, for searching within 
the contents of the VHA, researchers can use several search systems (“tools”). 
(1) The integral VHA search systems: People Search (approx. 1 million personal 
names), Index Search (around 67000 hierarchically ordered keywords), Bio- 
graphical Search (date of birth, place of birth, experience, etc.), Places Search 
(utilizing indexing terms with Google Maps), and Quick Search (combining all 
of the above). (2) Amalach search: a phonetic fulltext search engine created at 
the University of West Bohemia (Pilsen, Czechia). Amalach has been available 
at CVHM in beta-testing since 2012, with many new versions introduced since 
then, which also incorporate comments and suggestions from the CVHM visitors 
and staff. (3) Pixla search: A phonetic fulltext search similar to Amalach, but 
voice-controlled, developed also at the University of West Bohemia.? Pixla has 


8 Jan Svec et al., “On the Use of Grapheme Models for Searching in Large Spoken Archives," in 
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, 
AB, 2018, 6259-63, doi:10.1109/ICASSP.2018.8461774. 

9 Adam Chylek, Luboš Smidl, and Jan Švec, “Question-Answering Dialog System for Large Audio- 
visual Archives," in Text, Speech, and Dialogue: TSD 2019, ed. Kamil EkStein, 385-97 (Cham: 
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been available at CVHM since May 2020 for user testing, which was, however, 
heavily hindered by the COVID-19 pandemic. 

Despite the rich variety and admirable effectiveness of these search systems, 
the awareness of their mutual differences in generating the sets of possibly rel- 
evant interviews is crucial for a successful implementation in research. A rea- 
sonable common-sense presupposition of a researcher-user would be that all 
these research tools allow users to search within “the same data" - in our case, 
the complete corpus of 558 interviews in the Czech language. However, we argue 
that in practice, the “tools” effectively produce the “data.” As @rmen puts it, the 
“search results are made in the act of searching."!? Not only that each search 
system requires very different search mindsets at the input, but the plurality of 
three fundamentally diverse search tools renders it nearly impossible to arrive 
at “the same results" by using either of them. Using a “tool” therefore requires 
a fine-tuned way of formulating the search query, and the results provided by 
the search engine are ontologically framed by the boundaries of this formula- 
tion. Ultimately, the users’ knowledge of the search engines and resulting “data- 
sets," gained hermeneutically through numerous iterations of processing their 
research requests, also projects onto the way in which they pose their research 
questions and assess the feasibility of related research designs. At the center of 
this chapter, therefore, we put the users' situated practices rather than the tech- 
nically intended features of the "tools," following the apt advice given by Egon 
Bittner to social scientists already in 1965: “It seems reasonable that if one were 
to investigate the meaning and typical use of some tool, one would not want to be 
confined to what the toolmaker has in mind.” 


3 Experiment 


To illustrate our point and inspect our assumptions, we focused on three types of 
research topics at three different levels of concreteness. Accordingly, we designed 
atypology of research questions characteristic to the domains of Jewish and Holo- 


Springer, 2019); Adam Chylek, Luboš Smídl, and Jan Svec, “Multimodal Dialog with the MALACH 
Audiovisual Archive," in Proceedings from Interspeech 2019, 3663-64, accessed February 10, 2021, 
doi:10.21437/Interspeech.2019. 

10 Jacob Ormen, “Googling the News: Opportunities and Challenges in Studying News Events 
through Google Search," Digital Journalism 4, no. 1 (2016): 107-24, accessed February 10, 2021, 
doi:10.1080/21670811.2015.1093272. 

11 Egon Bittner, *The Concept of Organization," Social Research 32, no. 3 (Autumn 1965): 249, 
accessed February 11, 2021, https://www.jstor.org/stable/40969788. 
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caust studies based on which we formulated a set of ten specific tasks that served 
as a guiding framework for the observed experimental interaction (see Table 1). 
The tasks were provided to the experiment participants in Czech language as an 
online form which also included blank fields to fill in the results of their work 
(names of narrators). 


Table 1: Search tasks overview. 


Geographical Reaching from Q1: Find narrators who mention Pankrac prison 
terms supra-localized . . d P 
E Q2: Find narrators who mention Vinohradská street 
reference points 
such as buildings, Q3: Find narrators born in the territory of interwar 


geographically located Czechoslovakia 
institutions or street 
names to macro-level 


Q4: Find narrators born in Carpathian Ruthenia 
between 1919 and 1939 


concepts 
Paralinguistic Including functional Q1: Find narrators who interact with their relatives 
phenomena components of oral (during the interview) 


and visual history 

(nonverbal cues, 

visual demonstrations, 

emotions, sounds) Q3: Find narrators who show military decorations 
(during the interview) 


Q2: Find narrators who show their tattoos (during 
the interview) 


Q4: Find narrators who show Jewish religious 
objects (during the interview) 


Abstract Having an implicit Q1: Find narrators who mention transformation of 
concepts or explicit verbal their religious identity 
representation, such 
as identity (a typical 
relevant topic in human 
and social sciences) 


Q2: Find narrators who mention the loss of their 
identity 


In order to emulate and uncover the fundamental user practices in solving qual- 
itatively different types of research questions with various search engines, we 
have conducted three experimental sessions with six novice users (university 
students). Reflecting on the intersection of disciplines in Digital Humanities, we 
selected participants of the experiment based on their educational background. 
Accordingly, three of them were from the IT sphere while the others were from 
humanities and social sciences. They were working in pairs through the vid- 
eo-conferencing platform ZOOM. They had approximately 40 minutes to explore 
the “tools,” deal with the experimental tasks, and complete the form with “data” 
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considered relevant. For analytical purposes, we treated each pair as a collec- 
tive actor operative as an element in the correlative research triad: user — tool — 
search query. 


4 Findings 


The recordings of the video-mediated interactions were analyzed through the 
perspective of qualitative sociological analysis and multimodal interaction anal- 
ysis. A methodological note is in place here. We are aware of the fact that the 
experiment setting and available corpus of recordings could be possibly seen as 
insufficient from a cognitivist point of view, because we purposely did not obtain 
access to the participants' individual work with their computers (via screen-cap- 
ture apps, eye-tracking, etc.?). However, our analytical approach is grounded in 
a praxeological point of view and in naturalistic video-based studies of human 
sociality,? exemplified in earlier studies of video-mediated interaction.“ We aim 
to describe the participants' own practices through which they methodically and 
obviously achieve the completion of their tasks. Thus, we record and analyze 
those aspects of the video-mediated interaction that are observably consequen- 
tial for the participants in their collaborative work as a pair. In short, as the par- 
ticipants manage to do their assignments without the need of accessing each 
other's private on-screen conduct or locating the precise position of their inter- 
locutor's on-screen gaze, we should also be able to do without it in our analyses. 
Everything that the participants themselves need is already there. The subject 
matter of our research is the witnessable social order,” and we take into account 
what the members of the pair themselves observably orient to. 


12 Cf. Robert J. Moore, “A Name Is Worth a Thousand Pictures: Referential Practice in Human Inter- 
actions with Internet Search Engines," in Mobile Speech and Advanced Natural Language Solutions, 
ed. Amy Neustein and Judith A. Markowitz (New York: Springer, 2013), 259-86; Robert J. Moore and 
Elizabeth F. Churchill, “Computer Interaction Analysis: Toward an Empirical Approach to Under- 
standing User Practice and Eye Gaze in GUI-based Interaction," Computer Supported Cooperative 
Work 20, no. 497 (2011): 497-528, accessed February 10, 2021, doi:10.1007/s10606-011-9142-2. 

13 See, e.g., Charles Goodwin, Co-Operative Action (Cambridge and New York: Cambridge Uni- 
versity Press, 2018). 

14 A review is provided by Jakub Mlynát, Esther González-Martínez, and Denis Lalanne, “Situ- 
ated Organization of Video-Mediated Interaction: A Review of Ethnomethodological and Conver- 
sation Analytic Studies," Interacting with Computers 30, no. 2 (2018): 73-84, accessed February 
10, 2021, doi:10.1093/iwc/iwx019. 

15 Harold Garfinkel, Ethnomethodology's Program: Working Out Durkheim's Aphorism (Lanham, 
MD: Rowman & Littlefield, 2002); Eric Livingston, *Context and Details in Studies of the Wit- 
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Through our research, we have identified three basic sequential practices of 
collaborative work in the experimental setting (testing, sharing, implementing). 
Furthermore, the participants seem to employ either aggregation or refinement 
as two general strategies for obtaining relevant results. Amongst most partic- 
ipants, we observed a tendency towards establishing universal solutions uti- 
lizable for a larger number of tasks. “Tools” seem to effectively produce “data” 
through the practice of querying, which consists of breaking down the task 
(question at hand) into searchable units: either through *keywording" (trans- 
forming the question into possible keywords in metadata) or through “dis- 
coursing" (transforming the question into possible phrases in the speech) - the 
former dominating. 

We should state from the outset that we are conscious of the limits of our 
study, which is primarily intended as explorational. The experimental setting is 
a very specific (and indeed unusual) situation. Some of the practices described 
below could therefore be a residue of the experimental design. For instance, if 
participants would not have access to all ten questions for the whole duration of 
their work, their methods of task solution could develop in a quite different way. 
This is a conjecture that can only be evaluated by conducting further empirical 
studies where these specific conditions are modified. With this in mind, more 
research is needed to confirm and elaborate our findings. In the future, we plan 
to conduct follow-up experiments, this time also providing instructions to the 
participants and observing any possible changes in their search practices. Never- 
theless, we believe that the results presented below have merit and can serve as a 
useful point of departure for further work. Concurrently, we hope to inspire other 
researchers to conduct similar studies in both experimental and - perhaps more 
importantly - naturalistic everyday settings. 

In the following subsections, we present the main observations and findings 
from our analysis of the recordings of the experimental sessions. We describe and 
conceptualize the participants’ practical approach to searching and solving the 
task(s) by employing the available search systems. First, we focus on the practice 
of testing, sharing, and implementing in the course of searching. Then we move 
to aggregation and refinement as two typical general approaches to search query- 
ing. Third, we describe how participants transform the experimental search tasks 
into searchables by *keywording" and “discoursing.” 


nessable Social Order: Puzzles, Maps, Checkers, and Geometry," Journal of Pragmatics 40, no. 5 
(2008): 840-62, accessed February 10, 2021, doi:10.1016/j.pragma.2007.09.009. 
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4.1 Testing, Sharing, Implementing 


As a basic structure within the sequential development of the participants’ col- 
laborative work, we have identified the triad of testing, sharing, and implement- 
ing. Testing consisted of experimenting with the search systems and trying them 
out, typically in a solitary manner. Because of the predominantly individualistic 
nature of testing, the next phase of sharing has to do with intersubjective orien- 
tation, and learning from each other about the knowledge that emerged from the 
separate testing. The next synthetical step is implementing, which has to do with 
a specific use of the search systems for different experimental tasks. Abstraction 
and generalization of findings, in terms of practical procedures for working with 
the search systems and their relation to the displayed set of results, happened 
quite often within this step as well. 

Our initial understanding of testing, sharing, and implementing was to con- 
ceive of them as three subsequent steps or phases in the temporal structure of 
collaborative work with the search engines. Progressively, through more refined 
analysis, we have arrived at a dynamic understanding of these concepts as labels 
for mutually interdependent work practices which are recurrently combined 
throughout the session. We believe that the second conception is more useful 
and closer to the reality of the users' actual work with the digital archives under 
scrutiny. However, the labels “testing,” “sharing,” and "implementing" remain 
approximate glosses, which only serve to underscore certain aspects of the par- 
ticipants' work and provide a general framework for it. In praxis, they consist of 
various verbal and nonverbal practices, including not only talk-in-interaction!é 
and embodied action" but also the observable work with the software interfaces 
(such as demonstrations on a shared screen). In the following two subsections, 
we will describe and illustrate some of the more nuanced practices as compo- 
nents of the setting-specific actions of testing, sharing, and implementing. 


4.2 Aggregation and Refinement 


This subsection describes and illustrates the two main approaches to task solu- 
tion, identified in our analysis of the video recorded experiments. The first method 
used by the participants is aggregation. This approach seems to aim at generat- 


16 Emanuel A. Schegloff, Sequence Organization in Interaction: A Primer in Conversation Analy- 
sis (Cambridge: Cambridge University Press, 2007). 

17 Christian Meyer, Jürgen Streeck, and J. Scott Jordan, eds., Intercorporeality: Emerging Sociali- 
ties in Interaction (Oxford: Oxford University Press, 2017). 
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ing *any" results, even with a low level of relevance - i.e., typically yielding high 
numbers of interviews. Obtained results are then manually sorted and some of 
them selected as *proper" results which are reasonably answering the question at 
hand. As an example, consider Extract 1 below, in which Participant 1 (P1) and Par- 
ticipant 2 (P2) work together on the question "Find narrators born in the territory of 
interwar Czechoslovakia," already more than 30 minutes into the session. P2 has 
been sharing her screen throughout the whole session. They decided to use the 
VHA system upon which P1 commented that “there you can search for the years of 
birth." According to P1’s suggestion, P2 types “1918-1939” in the Quick Search field 
and then they use a suggested search query. The excerpt begins when they have 
just clicked “Search” and are waiting for the results.*® 


Extract 1: First group / 34:05-34:40 


(1) ((results appear, see Figure 1)) 


USC Shoah Foundation Quick Searct 
Visual History Archive Online 


Y9 (October 


va TM es 


2 n» 


Figure 1: Shared screen after the search results for “1918-1939” appeared. Webpage: http:// 
vhaonline.usc.edu, date: 27 November 2020. 


18 Screenshots displayed below as Figures 1-5 and 7 are not illustrative, but come from the ob- 
tained video recordings and constitute our research data as documents of social interaction. In 
our case, the meeting in the video conferencing platform, including the use of shared screen, 
is the work environment used by the participants in our study. The figures show the on-screen 
appearance of the particular moment of the interaction. This accounts for the slightly impaired 
resolution of the webpages, as they have been shared in real time during the video call. The par- 
ticipants' faces and names have been anonymized. 
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(2) ((1 second pause)) 

(3) P2: Well... 

(4) Pl: Try to use the ‘collection,’ what is in that ‘collection’? 
(5) P2: ((clicks on the Collection menu, see Figure 2)) 


USC Shoah Foundation Quick Search 
Visual History Archive Online 


1918 (October 


Lan 


Figure 2: Shared screen after P2 clicked on the Collection menu. Webpage: http://vhaonline. 
usc.edu, date: 27 November 2020. 


Well ... Probably not. Probably no. 
(6) P1: Uuuuhhhh ... ((silently reading aloud)) Museum of Jewish ... 
(7) P2: ((closes Collection menu, clicks on Language menu)) 
(8) P1: Language ... Yeah set it to Czech language just so we see. 
(9) P2: ((clicks on Czech, results are loading)) 
(10) ((0.8 second pause)) 
(11) P1: Aah, I hope that this could filter those ... ((looks at her second screen)) 
(12) ((results appear on P2's shared screen, see Figure 3)) 
(13) P2: Hm... ((scrolls down)) 
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(m 


$ 
H 
f 
$ 
z 
f 
f 


ms s» 


Figure 3: Shared screen after the search results filtered by language appeared. Webpage: 
http://vhaonline.usc.edu, date: 27 November 2020. 


(14) P1: ((looks back on the shared screen)) Yes. Great. 

(15) P2: Yes? 

(16) P1: And that's ... OK. So. ((starts writing a name into the form)) Irena 
Brodová ... 

(17) P2: Next one Jan Kadlec ... 

(18) P1: ((types on her other computer) Mhmm. 


As we can see, although P2 shares her screen, she is depending - at least in this 
sequence — mostly on guidance and advice from P1 (lines 4 and 8). Furthermore, 
the crucial moment when the results on screen are determined to be proper 
answers which can be considered solutions to their task, is also decided by P1 
(line 14). The procedure started with typing a range of years (“1918-1939”), which 
generated a set of results displayed in Figure 1. These results are not treated as 
adequate (lines 3 and 4), as they visibly are not related to Czechoslovakia. Thus 
the next step consists of finding a way to "filter out" the interviews related to 
Czechoslovakia, which is done first by an attempt to use the “Collections” filter 
(see Figure 2) and later by setting the language of interview to Czech (lines 8-11). 
The results that appear thereafter (see Figure 3) are treated as satisfactory by 
P1, who produces a “jubilatory *yes'"? and a positive assessment of the results 
("great") in line 14. Then she moves to writing down the displayed results in the 
online form which is open on her second computer screen. She reads the first 


19 PhilippeSormani, “The Jubilatory YES! On the Instant Appraisal of an Experimental Finding,” 
Ethnographic Studies 12 (2011): 59-77, accessed February 10, 2021, doi:10.5449/idslu-001104716. 
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name aloud for herself; but perhaps not just for herself, as P2 picks up this prac- 
tice and reads out the following name in the list (line 17), providing P1 with a 
next writable,?° which is confirmed and typed by P1 in line 18. After another 30 
seconds (not included in the transcript) and three more names written in the form 
they decide that *perhaps this is enough." 

Along with aggregation, participants have also used query refinement as 
the second approach to obtaining search results. Its aim is to make the search 
as specific as possible, in order to receive a low number of very relevant inter- 
views, which could be directly copied to the form as “proper” results answering 
the question. An example is provided in Extract 2, which shows Participant 3 (P3) 
and Participant 4 (P4) working on the task “Find narrators who mention trans- 
formation of their religious identity." P3, who is sharing her screen, had used 
Pixla to search for a textual query "religious identity" (although Pixla is intended 
as a voice interface). As the excerpt begins, they are discussing the two results 
obtained (see Figure 4). 


Extract 2: Second group / 20:00-20:40 


[y Fiszgop (41607) @ 
———X 
» . RN 


So if | said I'm going, | went, 


Figure 4: Shared screen with the search results of a textual query "religious identity" in Pixla. 
Webpage: http://amalach.zcu.cz, date: 27 November 2020. 


20 Lorenza Mondada, “Going to Write: Embodied Trajectories of Writing of Collective Proposals 
in Grassroots Democracy Meetings," Language and Dialogue 6, no. 1 (2016): 140-78, accessed 
February 10, 2021, doi:10.1075/1d.6.1.05mon. 
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(1) P4: Just I think that in this task it’s something else well, ehm, that ... ((P3 
changes the tab to the online form; P4 reads out part of the task)) trans- 
formation of their religious identity, so I think those that changed 
their religion, or ... their beliefs. 

(2 P3: Yeah? 

(3) P4: Sod rather formulate the query like uhh, ‘change religion’ ((said in 
English)). Like change of religion rather than religious identity. 

(4) P3: I have put it in this formulation into another one but ... There it 
has ... It didn’t find much ((types ‘change’ in the search field)) but it 
found ‘change politics’ ((said in English)), so maybe this ... 

(5) ((4 second pause, P3 types ‘identity’ in the search field)) 

(6) P3: Yes. ((submits query, results start loading)) 

(7) P3: Into this ... ((switches tab to VHA, see Figure 5)) 


Figure 5: Shared screen after P3 switches from Pixla to VHA Webpage: http://vhaonline.usc.edu, 
date: 27 November 2020. 


(8) P3: Ihave put it into this one. 
(9) P4: Mhm. 


In line 1, P4 mentions the task formulation, and after he produces several hesita- 
tion markers, P3 switches the tab in her browser to the form with the tasks. Now 
P4 uses the on-screen text as a resource and reads aloud the second part of the 
task. Thereafter he suggests that they should reformulate the search query in a 
way which would be in a better accord with the task: “change religion” (line 3). 
In line 4, P3 responds that she already tried that earlier in the VHA system, but 
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she agrees that they can also try here in Pixla, and in lines 5 and 6 she types and 
submits the query, providing online commentary on her ongoing activity (lines 
6 and 7). Next, while they are waiting for the results to appear (lines 7 to 9), she 
switches the tab again, this time to the VHA interface (see Figure 5), where she 
then proceeds to show P4 the results of her previous attempt (not displayed in 
the transcript). Contrary to Excerpt 1, P3 and P4 are reformulating and specify- 
ing the search query rather than handsorting the relevant results, possibly just 
because the number of obtained results is too low and not responding to the task 
(see Figure 4). Note that in lines 3 and 4, both participants resort to code-switch- 
ing - i.e., alternation between two languages in the course of a single interac- 
tional sequence.” Here, rather than switching between two languages for the 
sake of mutual understanding or expressing oneself, the language choice is a 
“significant aspect of talk organization"? in a different sense: English is used 
because it appears to be taken as the language of the search system. The partic- 
ipants in this strip of interaction seem to operate with the assumption that the 
search query must be written in English. Therefore, they formulate the query in 
their talk precisely as it should be typed in - i.e., they use words from the English 
language.” 


4.3 Querying: Keywording and Discoursing 


After describing the findings on the interactional practices of collaborative work 
which emerged in the observed experimental settings, in this section, we will 
focus in some detail on the relation of the work praxis to the software search 
systems. Responding to our central question posed in the title of the chapter, we 
proceed to the argument that “tools” produce “data” through the process of que- 
rying (see Figure 6). This consists of breaking down the question into searchable 
units by way of two practices: *keywording" and “discoursing.” They were not 


21 See, e.g., Monica Heller, ed., Codeswitching: Anthropological and Sociolinguistic Perspectives 
(Berlin: Mouton de Gruyter, 1988). 

22 Joseph Gafaranga, "Language Choice as a Significant Aspect of Talk Organization: The Or- 
derliness of Language Alternation," Text 19, no. 2 (1999): 201-225, accessed February 11, 2021, 
doi:10.1515/text.1.1999.19.2.201. 

23 Itcan be noted that although the participants seem to employ this assumption in their work, 
itis not quite correct, because the search system could also process queries in Czech, and in fact 
it would be the right approach as their overall task was to find interviews in Czech language. 
However, our chapter did not set out as an evaluative undertaking, and we aim at describing and 
explicating the participants' action rather than assessing it. 
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equally present as practical methods in the recordings of the three user pairs, but 
participants appeared to orient to them and verbalize aspects of them. We will 
describe and illustrate these practices in the following paragraphs. 


Figure 6: Querying as the interplay of “tools” and “data.” Graph by the authors. 


The practice of *keywording" - turning the task/question into possible keywords 
in archival metadata - is illustrated by Extract 3. We encounter the same pair as 
in Extract 1, but this time they are at the very beginning of the session, discussing 
how to organize their collaborative work. Before the excerpt begins, P1 agrees 
that she will be sharing her screen and P2 informs that she has the online form 
open on her second screen. 


Extract 3: First group / 4:07-4:30 


(1) P1: ((switches tabs in her browser) Hm hm hmm. Yep. 

(2) P2: I would start with some ... Some task which looks like, maybe the 
last one ... 

(3) P1: ((switches to VHA login screen, then to her mailbox, and to the list of the 
tasks, see Figure 7)) 
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CVM. Expense, 11-2020. Uhr pdf 


25 Vyfiecege pamdinay sce Mel hoor O promiet sve náoofentké et 
Magee paese y: ce Ken béhem removers und Pel ROT 
Vyfsecege cardi ce ke ve vyponts imus vtero na Packrics 
NajGee patri ace RIN Ba Zah, MBSE erty 
viyféedieyte pamatay -ce Aten DAEM OOU AAY Sb VOI EAR VENNA 


7 NIECE paendiróry LO KON vet SAN YODAN IAP VOODIS uito 


E Vyfiedate cambi ios. IA se navel na dremni mezivlieónano Ceskosiovenska. 
tre Pron M 


S Nadee paemdtrir y ce kei DANEM remover uk) DONIA ndonPensat pledméty 


10 Vyfiedege paméteiry-ce Ke se naoga na Pocherpatsht Ruw men ety 1919 a 1929 


Figure 7: Shared screen with the list of the tasks sent in PDF to the participants for their 
collaborative work. 


(4) P2: That looks like there are enough keywords which could be used for 
the search. 

(5) P1: Yeah. ((0.5 second pause)) Alright so ... Should we try it? 

(6) P2: There you search like ... ((P2 looks at her second screen; P1 points with 
the cursor to ‘Podkarpatské Rusi' [Carpethian Ruthenia] in the text)) 
Well, ehm, ‘Podkarpatska Rus’ and birth ... 

(7) P1: ((switches to tab with VHA)) 


While P1 prepares her screen and produces a conventionalized melodic triad of 
fillers or placeholders?* which indicates waiting (line 1), P2 turns to her second 
screen where the online form with the list of tasks is displayed. In line 2, she sug- 
gests that they should start with a specific type of question - one that “looks like” 
something — but doesn’t finish the phrase and rather changes to a more concrete 
designation (*the last one"). P2 seems to take this utterance as an instruction 
to look at the last question in the list, as she switches tabs to the PDF with the 
tasks (which the participants received by e-mail just before the experiment; see 
Figure 7), locating the last question: “Find narrators born in Carpathian Ruthe- 
nia between 1919 and 1939." Meanwhile, having already specified the exemplar 
instance of a more general question type, P1 repeats "that looks like" in line 4 and 
then makes explicitly relevant the use of keywords as a search method. P1 aligns in 


24 Nino Amiridze, Boyd Davis, and Margaret Maclagan, eds., Fillers, Pauses and Placeholders 
(Amsterdam and Philadelphia: John Benjamins, 2010). 
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line 5 (“Yeah”), holding her cursor under the word “Podkarpatskä” (“Carpathian”) 
on her shared screen, and then suggests that they could open one of the search 
systems to “try it." P2 does not align but continues her explanation, describing 
very precisely how the process of *keywording" actually works: the question is 
transformed into two searchable items, “Carpathian Ruthenia" (“Podkarpatska 
Rus" in Czech) and “birth” (“narozeni” in Czech). Note that none of these terms 
are present as such in the question: P2 lemmatizes “in Carpathian Ruthenia” (“na 
Podkarpatské Rusi") and instead of using “born in” she suggests the noun “birth.” 

The alternative practice of “discoursing” — turning the task/question into 
possible phrases in the archived speech - is illustrated by Extract 4. Twenty 
minutes into the session, the pair consisting of Participant 5 (P5) and Participant 
6 (P6) works on the question "Find narrators who show their tattoos (during the 
interview)." This pair was the only one that decided not to share one of the par- 
ticipants' screens for the whole duration of the experimental session (although 
they did share it occasionally in an ad hoc manner). Just before the excerpt starts, 
P5 summarizes that they have done three questions out of ten, being around 15 
minutes into the job. 


Extract 4: Third group / 19:52-20:58 


(1) P5: Well, now I think we need to find out how to search for those that 
are about showing something. Because that's, apart from the Car- 
pathian Ruthenia, those are all the remaining questions. ((Laughs)) 

(2 P6: ((Laughs)) So according to what is going on in the video? 

(3 P5: (Whatis) going on in the interview, here, right ((starts reading from 
the list of tasks)) - interact with their relatives, show their tattoo, 
show their military decorations, show Jewish religious objects. 

(4) ((1.2 second pause)) 

(5) P5: And then ((continues reading aloud)) mention the loss of their 
identity, mention transformation of their religious identity, and 
then there is the Carpathian Ruthenia. 

(6) P6: Mhm.So there must be ... For this there must be some special tool. 
Right, probably? 

(7 P5: We must somehow find out how to use it. 

((2 second pause, two hearable clicks from P6)) 

(8 P5: Sol will try ... PI try that Amalach, and Pll try for instance tattoo, 
just like ... Like ‘look at my tattoo’? 

(9) P6: ((Laughs)) ‘Look at my tattoo" Heh heh heh. ‘Watch this!’ Heh heh 
heh heh. 

(10) P5: ((Laughs)) 
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In line 1, P5 states that the remaining questions are questions of a certain type — 
they have to do with “showing something." P6 joins her in laughter and asks a 
follow-up question about the nature of this question type, which P5 confirms and 
specifies by reading out aloud all relevant parts of the questions that they still 
need to do (in lines 3 and 5). Note how precisely her categories are overlapping 
our own typology displayed in Table 1. First, in line 3, she lists the questions that 
include a visible feature of the narrator or their environment. After a pause she 
lists a next category of questions, which have to do with identity, and then she 
mentions the question about Carpathian Ruthenia, which does not fit either of 
those categories, but they have decided earlier (after spending some time with 
attempting a search) that they will put this question aside for later. P6 responds 
by producing first an acknowledgment token? and then a formulation of the gist? 
of P5’s previous turns (line 6), which specifies that there must be a “special tool” 
for these categories of questions. P5, perhaps in a corrective manner, responds 
that they must “find out how to use it" - i.e., they already have the tool but they 
need to acquire the competence to use it efficiently. After 2 seconds of silence she 
suggests that she will use Amalach and type in an imagined speech phrase which 
could possibly accompany a video-recorded scene of someone showing a tattoo 
(line 8). Although her suggestion is then treated (first by P6 and then also by P5 
herself) as laughable, even somewhat ironicized by P6 (“Watch this!" in line 9), 
and we are indeed not sure whether she has typed the phrase into the search field 
(her screen is not shared and she doesn't account for it), this sequence clearly 
shows participants' orientation to the practice of “discoursing” as a form of que- 
rying. The fact that itis treated as laughable might indicate that it is more unusual 
in comparison to the practice of *keywording," which is utilized much more rou- 
tinely (see Extract 3). 

It seems that in the VHA, perhaps given the abstraction required to produce 
some results via search query, search results obtained by way of *keywording" 
are treated as less "certain" and require further checking. On the other hand, 
Amalach (and Pixla), in its “concreteness,” provides results through “discours- 
ing" practices with higher certainty which can also be checked more easily (and 


25 Gail Jefferson, "Notes on a Systematic Deployment of the Acknowledgement Tokens 'Yeah' 
and ‘Mm hm’,” Papers in Linguistics 17 (1984): 197-206, accessed February 12, 2021, doi:10.1080/ 
08351818409389201. 

26 John Heritage and D. Rod Watson, *Formulations as Conversational Objects," in Everyday 
Language: Studies in Ethnomethodology, ed. George Psathas (New York and London: Irvington, 
1979), 123-62. 

27 Cf. Elizabeth Holt, “On the Nature of ‘Laughables’: Laughter as a Response to Overdone Figu- 
rative Phrases," Pragmatics 21, no. 3 (2011): 393-410. 
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are less opaque). Such predilection for keywording or discoursing contingent 
upon particular search systems seems true, however, only to a certain extent. In 
the analyzed interactions, the participants often used the same querying prac- 
tices independently from the type of search engine. For example, querying by 
keywording in VHA generated not only a set of resulting interviews, but also 
offered new keywords that came up as results of the first keyword search. These 
newly “discovered” keywords were then typed into other search systems such as 
Amalach - i.e., used as resources in the “discoursing” practice. 


5 Concluding Remarks 


The aim of this chapter was to identify and describe participants' practices for 
working with a large corpus of audiovisual Holocaust testimonies, especially in 
terms of locating relevant results within the collection by using three different 
search systems. We have started from the assertion that various search devices 
and systems, rather than working as non-problematical “tools,” dynamically 
produce and practically construe what can be conceived as "data." Instead 
of formulating further insights on a theoretical or conceptual basis, we have 
tackled our subject matter through a low-scale empirical study. It consisted of an 
experimental setting where three pairs of novice users solved ten tasks over vid- 
eo-conferencing software, utilizing three different search “tools” (VHA, Amalach, 
and Pixla). The experiment was designed to emulate work with various search 
engines, such as that of the researchers working in the Malach Center for Visual 
History. 

Our main findings presented in this chapter consist of formulating a fun- 
damental structure and elements of participants' collaborative work, which 
appears to be composed of three complementary actions: testing, sharing, and 
implementing. The tasks were solved, and relevant results obtained, by two main 
approaches: aggregation and query refinement. Each singular search act is taken 
as an instance or an example of some - thus far unknown (to the users) — general 
features of the systems, which are to be discovered and identified. The systems 
are then discussed regarding their utility for solving questions of *a certain kind" 
(cf. Table 1). In order to conduct searching, the participants seek to transform the 
textual task into a working query that returns a set of relevant results. They do 
this mostly by keywording (turning the question/task into a set of keywords), and 
much less often by discoursing (turning the question/task into a possible expres- 
sion in natural language). This might be partially caused by the force of habit 
as searching keywords is the prevalent practice when using database systems in 
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contrast to direct interaction with such a system in natural language, though the 
Pixla system is specifically designed for this purpose. However, it became appar- 
ent that participants were able to differentiate the effectiveness of both practices 
in different environments. The practice of “keywording” seems to stimulate 
abstraction and is used more often in regard to the VHA search interface, while 
the practice of “discoursing” stimulates concretization and is used more often in 
regard to Amalach and Pixla systems. All the concepts as presented in this chapter 
were drawn inductively and illustrated by examples from the empirical materials, 
aiming to capture the dynamic process of development and utilization of progres- 
sively improving shared knowledge of the workings of the search systems. 

Nevertheless, we have realized that the participants surprisingly did not 
upgrade their searching skills progressively step by step. Their emergent knowl- 
edge of the search systems rather seemed to be implemented hermeneutically, 
again and again, for the whole set of experimental tasks, leading to an improve- 
ment of the whole list of answers — each time a new attribute of the search systems 
had been discovered. Most of the time, they used the current *best knowledge" 
for all the tasks and all search engines at once. The systems and the tasks were 
taken for all practical purposes not as separate entities, but as parts of a whole. 
Also, the *best knowledge" did not alter depending on the qualitative differences 
between the search questions. Accordingly, the *best knowledge" has universal- 
izing tendencies: the participants aim to establish practices which are utilizable 
for solving multiple tasks. Still, such knowledge is not a static entity, but it is 
continuously improved on the basis of participants' collaborative work with the 
search engines and the results obtained so far, in and through their work on the 
previous tasks. 

One of the important findings of our study points to what we call the “goog- 
ling paradigm," indicating thus the user's orientation to the search process as 
not requiring a knowledge of the search tool’s inner workings.?? The practices of 
breaking down questions into searchable queries (keywords or discursive units) 
establish the horizon of the materials to be searched. Furthermore, it also seems 
to structure the participants’ practical engagement with the user interfaces - e.g., 
in the case of VHA, during the time dedicated to the experiment, our partici- 
pants (as untrained novice users) very rarely moved beyond the simple “Quick 
Search," which is visually highlighted and designed in a way that resembles a 
Google search field. In 2010, Lee et al. called Google “one of the most influential 


28 Hillis, Petit, and Jarrett find the naturalization of *knowledge of" search without *knowledge 
about" how it actually works a direct consequence of the Google “magic box.” Ken Hillis, Michael 
Petit, and Kylie Jarrett, Google and the Culture of Search (New York: Routledge, 2013), 14-15. 
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symbols of the new Internet paradigm" since the turn of the century. Another 
ten years later, we are witnessing a googling paradigm - to which new members 
of society are ordinarily introduced?? — as it operates in a broad cultural environ- 
ment beyond simple web searches and structures the way we routinely approach 
the very procedures of locating relevant information. 

Though the last mentioned observations reflect the recent digitalization of 
society in general, one must consider the specificities of the Holocaust research 
domain. Our study is inherently set in this field as well, already by default owing to 
the nature of the sources we use. It is necessary to bear in mind the singularity of the 
Holocaust as the ultimate cultural trauma, constant memento, and a negative point 
of reference for the contemporary “Western” value system. In this respect, among 
the primary imperatives (both research and ethical) are adequate source representa- 
tion and interpretation, which is perhaps even more crucial when working with 
survivor testimonies. As we argued, search tools have a direct impact on displayed 
data, while the process of searching can lead to de-contextualization and re-con- 
textualization of the original archival recordings. Not only can the user overlook the 
broader context of the found "segment" within an individual's entire personal life 
story, but imperfectly formulated search queries may also cause omission of some 
important aspects of the historical reality. Although any user interface generates a 
certain learning curve, we observed a considerable lack of adjustment in partici- 
pants' actions that would respond to the particularities of the systems in use. This 
seemed to have possibly dissuaded them from discovering more about the tools as 
well as the resulting sources. We believe this is a critical goal for the future develop- 
ment oftechnologies for accessing Holocaust related sources. Whichever the “tools” 
will be, the users should be clearly made aware of how they arrived at their “data,” 
what those results represent, and how they thus ultimately affect their research. 


Abbreviations 


CVHM - Malach Center for Visual History at the Charles University 
PDF - Portable Document Format 
VHA - Visual History Archive of the University of Southern California Shoah Foundation 


29 Sang Hoon Lee et al., “Googling Social Interactions: Web Search Engine Based Social Net- 
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N-gram-based Content Indexing: 
Semiautomated Analysis of Holocaust 
Testimonies 


Abstract: Holocaust interview collections are key resources to preserve and share 
the memories of genocide survivors. As the size of testimonial collections makes 
manual analysis unwieldy, Digital Humanities (DH) methods can help develop 
a comprehensive view of the collections' contents and narratives. Developing 
nuanced and comprehensive methods to index and summarize the content of 
these interviews is central to ensuring survivor testimonies are accessible and 
searchable even in large-scale archives. This chapter develops a content indexing 
system that is not limited to the identification of keywords to summarize content, 
but rather includes longer phrases and emotional expressions. In particular, the 
chapter presents a semiautomated DH approach based on N-grams, i.e. frequent 
sentence chunks of ‘N’ words in a row, tested on David Boder's corpus of Holocaust 
testimonies. The results show that this approach allows us to identify speech pat- 
terns and narrative structures that go unseen in traditional keyword-based index- 
ing of the survivors' testimonies, including structural, non-verbal categories like 
uncertainty, reticence, emotional insistence on time references. This method also 
helps shift our perspective from the narrative we expect survivors to use to the 
actual narrative they use, which may go unnoticed if we are not actively searching 
for it or if we only focus on major keywords. 


Keywords: Holocaust, testimony, memory, Digital Humanities, interview archives, 
N-grams, content indexing, narrative structures 


1 Digital Archives of Holocaust Testimonies 


Digital archives of Holocaust and genocide testimonies are invaluable resources 
to preserve the histories and memories of survivors through large collections of 
interview recordings and transcripts. As the number of living Holocaust witnesses 
decreases and the scale of digital interview collections increases, these archives 
raise a pressing question of access to memory: what can we learn from these tes- 
timonies? What are we looking for when we consult them? The answers we give to 
these questions determine the technical means we develop to concretely access, 
read, or listen to survivors' testimonies, especially when the size of interview col- 


3 Open Access. © 2022 Anna Bonazzi, published by De Gruyter. [ESM This work is licensed under the 
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
https://doi.org/10.1515/9783110744828-005 
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lections makes manual analysis impossible. This chapter addresses the question 
of content indexing: in other words, how to annotate and categorize the content 
of thousands of hours of interviews in order to grasp their meaning at a glance 
or find specific topics across the whole collection. The approach I propose is a 
frequency-based analysis of interview transcripts based on N-grams aiming to 
identify recurring themes, expressions, and speech patterns. In particular, this 
chapter will focus on the interviews from the Boder corpus, a collection of 119 
survivor testimonies in about nine languages recorded by Russian-born American 
psychologist David Boder in Displaced Persons camps in France, Switzerland, 
Italy, and Germany in 1946, just a year after the end of World War II. Boder inter- 
viewed both Jewish and non-Jewish witnesses, survivors, and displaced persons, 
collecting the earliest known archive of oral Holocaust history.' 

To contextualize my work,? let us take a look at the way content indexing 
happens in interview archives such as the USC Shoah Foundation? archive and 
Boder's collection.^ These archives typically develop a keyword-based indexing 
process: they assemble a controlled vocabulary of relevant keywords and man- 
ually annotate interviews in regular units (for example, minute by minute, like 
the Shoah Foundation, or page by page, like the Boder collection). This indexing 
approach is useful to quickly summarize certain aspects of interview contents, 
taking advantage of the regular and scalable nature of controlled vocabularies: 
it gives researchers a map to navigate the collection, find important factual ref- 
erences, and track certain topics across multiple interviews. The type of access 
granted by keywords highlights the role of testimonies as repositories of evi- 
dence, such as facts, names, and crucial historical references. 

However, like all manual approaches, a keyword-based indexing system inevi- 
tably has a few blind spots. The main limitation of keywords is that they are prede- 
termined: this approach tends to decide which concepts are significant and worth 
recording before hearing what survivors have to say. This decision is often influ- 
enced by existing research interests or expectations, for example preserving the 
memory of trauma or details about concentration camps. The risk of this method 


1 Alan Rosen, The Wonder of Their Voices: The 1946 Holocaust Interviews of David Boder (Ox- 
ford: Oxford University Press, 2010). 

2 My research was realized as part of the Holocaust Studies Digital Humanities Lab at UCLA, a 
research group including Prof. Todd Presner, Dr. Rachel Deblinger, Dr. David Shepard, Lizhou 
Fan, Michelle Lee, Wanxin Xie, Omar Hassan, and others. The group works on various DH-based 
approaches to the analysis of genocide interviews (primarily Holocaust, Rwandan Genocide, and 
Nanjing Massacre testimonies). 

3 "Visual History Archive," USC Shoah Foundation. 

4 “Voices of the Holocaust," Illinois Institute of Technology. 
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is that it centers the researcher's expectations more than the survivor's actual tes- 
timony. This can lead to a narrow gaze that mostly notices what already belongs 
to the canonical Holocaust narrative, which is problematic because survivors' tes- 
timonies exist before and outside of the Holocaust narrative as we know it today: 
they contribute to it but are not entirely defined by it. 

David Boder's indexing system is a good example of a deductive, concep- 
tually predetermined approach. Boder undertook an extensive interview project 
with Holocaust survivors in Western Europe as early as 1946, publishing some of 
the interviews as curated autobiographies.? With the help of his wife and gradu- 
ate students, he experimented with various analytic methods, including an early 
quasi-computational approach. As a pioneer in quantitative psychology, he was 
interested in the effects of trauma on catastrophe survivors, so he transcribed, 
annotated, and analyzed his recordings with the aim to quantify the survivors' 
response to trauma. One of the four main categories of his indexing system was 
dedicated to nuanced expressions of trauma. While this indexing system proved 
helpful for Boder's research goals, it offers a good example of a content classifica- 
tion approach that tends to find only what it set out to look for. 

Another crucial limitation of keywords is that this method tends to focus on 
what survivors say, mostly in terms of nouns and verbs, and does not easily allow 
us to understand how they say a certain thing, or what they choose not to say. 
In addition to keywords, there are other ways we may define what "content" is 
in a survivor's testimony. We may want to look at interviews not only as perma- 
nent repositories of historical evidence, but also as personal stories that center 
the survivor's state of mind and help us understand how mainstream genocide 
narratives developed and what other narratives we might be overlooking. Pace, 
language choices, narrative structures, insistence, uncertainty, silence, all play 
a role in the expression of a unique and historically situated memory. Text is far 
from the only aspect of an interview that we may want to consider. For example, 
in the case of an audio archive like Boder's, much can be understood by paying 
attention to the speed of the dialogs, specific moments of silence, annotations of 
emotion, and the change in pitch in the voice of the interviewees.’ Boder himself 
tried to preserve some of this paratextual and non-verbal information in his 
manual processing of the interviews. While his focus was primarily on the spoken 


5 David P. Boder, Topical Autobiographies of Displaced People Recorded Verbatim in Displaced 
Persons Camps with a Psychological and Anthropological Analysis (Chicago: D.P. Boder, 1950-1957). 
6 Alan Rosen, The Wonder of Their Voices. 

7 Seefor example Todd Presner's analysis of the audio track of some USC Shoah Foundation inter- 
views in Todd Presner, *From Wire Recorder to Database." Meyerhoff Lecture, United States Holo- 
caust Memorial Museum, October 18, 2018, accessed June 30, 2021, https://youtu.be/2aNufsZFfkM. 
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word, he took care to annotate details like long pauses, laughter, crying, and 
other visible expressions of emotion.? The voice and body languages of survivors, 
together with the outbursts of emotion that punctuate their testimonies, have the 
potential to enrich or even completely alter our understanding of interview tran- 
scripts. Ideally, a comprehensive indexing system of audio or audiovisual sources 
could offer its users not only references to spoken content, but also to emotion 
and body language. In an attempt to reimagine content indexing, this study only 
focuses on spoken word transcripts. However, by exploring additional options of 
what an index is supposed to record, it points to the presence of multiple layers of 
expression and meaning in recorded interviews. 


2 N-grams 


In order to expand the descriptive ability of indexing to include more aspects of 
a testimony, I experiment with inductive content annotation based on N-grams, 
meaning short segments of n words or characters (in this case, words). N-grams 
are a classic tool of computational linguistics and have long been used for text 
analysis and content mining in large data collections.’ They are particularly 
useful for text categorization and the identification of topic similarity, including 
in multilingual text collections, as they are language-independent.'? 


8 The graph in the following link provides an idea of the kind of emotions Boder transcribed — 
and raises questions concerning our assumptions of the kind of emotions asurvivoris "supposed" 
to display: accessed June 30, 2021, https://drive.google.com/file/d/1y68uqxwIhf3ZeaG, 7Aql1Inr . 
jo-nIju7/view?usp=sharing (UCLA Holocaust Studies DH Lab Work, 2019). 

9 See, e.g., Anne Burdick et al., Digital Humanities (Cambridge, MA: MIT Press, 2012); Susan 
Schreibman, Ray Siemens, and John Unsworth, A Companion to Digital Humanities (New York: 
John Wiley & Sons, 2008). 

10 See for example Riyad Al-shalabi and Rasha Obeidat, “Improving KNN Arabic Text Classifica- 
tion with N-Grams Based Document Indexing," in Proceedings of the 6th International Conference 
on Informatics and Systems INFOS2008 (Cairo, 2008), 108-12; Marc Damashek, “Gauging Simi- 
larity with N-Grams: Language-Independent Categorization of Text," Science 267, no. 5199 (1995): 
843-48; Armand Joulin et al., “Bag of Tricks for Efficient Text Classification," arXiv:1607.01759 
[cs], August 9, 2016, accessed June 30, 2021, http://arxiv.org/abs/1607.01759; Artur Sili¢ et al., 
*N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croa- 
tian-English Parallel Corpus," in Progress in Artificial Intelligence. EPIA 2007, ed. J. Neves, M.F. 
Santos, and J.M. Machado, 671-82. Lecture Notes in Computer Science, vol. 4874 (Berlin: Spring- 
er, 2007), accessed June 30, 2021, https://doi.org/10.1007/978-3-540-77002-2 56; Zhihua Wei et 
al., “N-Grams Based Feature Selection and Text Representation for Chinese Text Classification," 
International Journal of Computational Intelligence Systems 2, no. 4 (December 2009): 365-74, 
accessed June 30, 2021, https://doi.org/10.1080/18756891.2009.9727668. 
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In short, I split interview transcripts into snippets of two to six words, compute 
their frequency, and annotate information about the words and parts of speech 
each N-gram contains. Then I cluster them in a semiautomated way and manually 
analyze them to see which themes or recurring structures emerge from the text. 
This method essentially implies reading through text in a non-linear way: looking 
at groups of text units longer than words but shorter than paragraphs or pages 
helps us identify expression patterns and insistence on certain phrases. Most 
importantly, allowing an unbiased process" to decide for us which text features 
seem significant can help us notice categories we may not have been looking for at 
the outset, either because they have to do with structure more than with content, 
or because the content they point to may be unexpected. 

This approach is situated somewhere between close reading and unsuper- 
vised topic modeling: it resembles the former in its attention to sentence struc- 
tures and language choices, and it resembles the latter in its ability to quickly iden- 
tify the main themes of a text. As a semiautomated method, the N-gram approach 
presents the strengths and weaknesses of both automated categorization and 
manual analysis: while the identification of significant categories follows more 
traceable and objective criteria than a manual approach, it still requires subjec- 
tive judgment in the selection and naming of those categories. Moreover, this 
method relies heavily on frequency, which can be a drawback: in a text corpus, 
not all that is frequent is relevant and not all that is relevant is frequent. Still, 
my analysis suggests that sometimes elements of a testimony which are both fre- 
quent and relevant don't easily stand out in a manual or keyword-based analysis, 
while they do through an N-gram-based approach. 

The Boder corpus, given its small size as well as its idiosyncratic and multilin- 
gual character, provides a useful case study to test out various indexing methods 
which can then be scaled up to larger collections.” In what follows, I describe 
two uses of N-gram-based indexing: one for a corpus-wide analysis and one for 
the analysis of a single interview, to demonstrate the value of the proposed sem- 
iautomated DH approach. 


11 Unbiased is meant here in the computational sense (in an analogous machine intelligence 
sense, unsupervised), not the human one. 

12 See for example Hamed Jelodar et al., *Latent Dirichlet Allocation (LDA) and Topic Modeling: 
Models, Applications, a Survey," Multimedia Tools and Applications 78, no. 11 (2019): 15169211. 
13 Topic modeling is not very helpful on a relatively small and multilingual corpus like this one, 
but other computational methods proved effective, such as semantic triplets (under develop- 
ment by Lizhou Fan from the Holocaust Studies DH Lab at UCLA) and N-grams. 
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3 Corpus-wide Analysis 


Using N-gram-based indexing on an entire interview corpus can precede or com- 
plement close reading of specific samples, as it offers an insight into the hun- 
dreds of interviews we may not get to examine manually. Identifying the main 
structural and thematic patterns of a collection is also useful to enable a compar- 
ison between different testimony collections, analyze the evolution of testimony 
as a genre, and understand how a certain Holocaust narrative was established. 

To begin with, each digitized interview transcript was transposed to XML 
format and annotated with Boder's original information about the interview's 
date, location, and language, and the survivor's name, age, and religion. Each line 
of the interview was then automatically annotated with data about the speaker, 
language,'* and speed of the dialog (as Boder's interviews are highly multilin- 
gual, it was important to label the languages used in every line). The text was 
annotated with Treetagger,'* a multilingual Part-Of-Speech (POS) tool that iden- 
tifies grammatical parts of speech such as nouns, verbs, adverbs, etc. In order to 
filter out semantically weak N-grams (such as “and then we"), expressions that 
did not contain at least a verb, noun, or adjective were excluded from further 
analysis. POS annotation relies on context and frequency of a word, so it works 
better on longer texts and does not yield sensible results with short text snippets 
like an N-gram. For this reason, the text was first annotated as a whole and then 
segmented into N-grams, with each word carrying its POS tag (e.g., *meine-ad- 
jective Mutter-noun"). The text was segmented into N-grams of two to six words: 
for example, a sentence like “the Jewish Council got a call from the police" would 
produce 3-grams “the Jewish Council,” “Jewish Council got,” “Council got a,” “got 
a call,” 4-grams “the Jewish Council got,” “Jewish Council got a," and so on. These 
N-grams were marked with their interview and line ID number (so that they can 
be traced), then they were sorted by frequency to identify the most common 
ones: while a segment like “call from the” may appear once or twice in the whole 
text and not carry much meaning, N-grams like “the Jewish Council” or “from the 
police” may be more frequent and tell us more about the testimony. 


14 Two Python packages were used: TextBlob for lines below 11 characters (high accuracy but 
daily usage limitations) and langdetect for all other lines (high accuracy but only on longer 
strings; no usage limitations). 

15 This was also useful to analyze the use of linguistic code switching by survivors. 

16 Helmut Schmid, “Probabilistic Part-of-Speech Tagging Using Decision Trees,” in Proceedings 
of International Conference on New Methods in Language Processing (Manchester, 1994). 

17 Boder corpus, interview with Abraham Kimmelmann, August 27, 1946 (transcript n. 56, line 
n. 77, translated). 
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N-grams with a minimum frequency of five were manually examined and cat- 
egorized. To begin with, only N-grams in the largest corpus languages (German, 
French, English) and Yiddish were considered, while N-grams in other languages 
(Russian, Polish, Spanish) were only included if they had a high frequency and 
seemed related to existing categories. N-grams were labeled for their content (for 
example “camp” or “family”) or for their structure (for example "narration" or 
*uncertainty") across different languages. To do so, N-grams from all languages 
were manually labeled in English to group together related expressions in dif- 
ferent languages.'? For example, expressions like “zwölf 12 Uhr,” “in der Nacht,” 
*o'clock in the," *at that time," *de ce moment" were all labeled as "time," while 
expressions like “nisht gevust vos,” “I don't know," “ich weiß nicht" were labeled 
as “don’t know / don't remember / don’t want [to tell]." I followed a conservative 
labeling process, only attributing a label to reasonably unambiguous expressions 
and marking N-grams as *undefined" when in doubt about their meaning, for 
example in cases like “no such thing as" or “auf der Strasse," in order to avoid 
overinterpreting sentence snippets that were inevitably out of context and com- 
plicated by their multilingual nature. 

This labeling process is in no way conclusive: it is necessarily subjective and 
prone to errors, just like keyword-based indexing, and it benefits from recursive 
corrections and feedback, for example in categories like “narration” or “opinion” 
that seemed marked and expressive enough to be named but could not be con- 
sidered objective or clearly defined. In addition, an accurate analysis of frequent 
N-grams requires a few more processing steps (described in the next section) not 
yet included at this stage. Still, even an approximate analysis of the most frequent 
N-gram categories is productive to map out the main areas mentioned by inter- 
viewees: camp / suffering (678), don't know / don't remember / don't want (402), 
year (390), time (348), movement (224), Jews (141), Palestine (99), coercion (96), 
work (83), Germans (51), war (51), street (44), organizations (41), deportation (38), 
future / plans / family abroad (36), date (32), SS (23), liberation (22), ghetto (20), 
Russians (19), hunger (12), Americans (10), opinion (10), resistance (10), hiding (10). 

Of these, perhaps the most surprising is the second largest category: a high 
number of recurring phrases across the four languages considered here are var- 
iations of I don't know, I don't remember, I don't want to [most frequently: go / 
say / tell]. These sentences point to a category of uncertainty that is not conclu- 
sively explained by simple question-answer exchanges where the interviewer 


18 In cases where I didn't have a direct knowledge of the language (Polish and Russian N-grams), 
particularly frequent expressions were translated with Google Translate and only labeled if their 
meaning seemed unambiguous and clearly related to an existing category, for example the many 
instances of Russian N-gram “a He 3naio," “I don't know." 
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asks about a detail that the interviewee doesn't know. They also appear in the 
middle of the interviewees' narration, they interrupt the flow of a sentence, they 
give us a look into the interviewees' perspective as they explain that something 
was happening without their knowledge, or that they didn't know yet something 
was about to happen, or that they simply don't want to talk about something. 
Boder's interviews were conducted only about a year after the end of the war 
and in displaced persons camps, which means that most interviewees were not 
in a safe or stable environment and lived in a situation of uncertainty concerning 
their future, their livelihood, the whereabouts of their family. Many do not seem 
to have developed a cohesive narrative of their experience yet, and most of them 
had not yet had a chance to talk to people unfamiliar with what happened (as 
Boder was). It also seems likely that at this early stage, interviewees saw their 
experiences as individual pieces of a puzzle. While they have a sense of the scale 
of the atrocities that happened around them, there are still many elements of the 
larger historical picture they hadn't heard of yet, and many Holocaust terms we 
would all become familiar with in later decades are not part of the public dis- 
course at the time of their interviews. This category of uncertainty calls our atten- 
tion to the unscripted and complex nature of the memories that are being nar- 
rated and reminds us that uncertainty is only one of the many emotions survivors 
display in their interviews. While the N-gram-based indexing system is not able 
to fully reference the emotions and body language of interviewees, it does point to 
their presence, and may encourage the researcher to take a closer look at specific 
interviews or corpus-wide interview passages. 

Another N-gram category that stands out in the Boder corpus concerns time 
expressions (as opposed to chronological dates). While keyword-based indexing 
often classifies all references to time under the same label, the N-gram-based 
approach shows that survivors seem to be doing two very different types of memory 
operations when they refer to time. Sometimes they give specific dates and factual 
historical information either in response to Boder's questions or to describe their 
experience for external listeners. They talk about dates or years like “on the 28th 
of May" or “on the 30th of June 1941" or “in Summer 1943." At other times, though, 
their time references are highly subjective: they include vague durations, impres- 
sions of time passing, narrative constructs like “very late," “every night," “for a 
long time," or “it was always like that.” The space these expressions take up in the 
survivor's story is often disproportionate to the historical time they reference. At 
times, an overabundance of both subjective and objective time references signals 
episodes etched into the survivor's memory which arguably deserve to be marked 
as significant during the indexing process because that is the way the survivor 
talks about them. For example, 29-year-old Ephraim Gutman uses six different 
time expressions to tell Boder about one of the massacres he witnessed: “They 
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killed them on Wednesday in the night and on Thursday in the night, it was the 
first day of the month of Tammuz.”” In this case, the N-gram-based approach 
highlights the need for a nuanced model of representation and interpretation of 
time to address what Johanna Drucker has described as “the complexities of lived, 
reported, constituted, and produced temporality in human documents.”?° More in 
general, episodes like these offer us a glimpse into what survivors really wanted 
to convey and show that indexing their words for their mere semantic content 
("they're talking about time") might obscure something important about their 
testimony. For reference, let us take a look at the indexing systems developed by 
Boder and by the Shoah Foundation. Based on an analysis of three of Boder's 
typewritten and annotated interview transcripts,” Boder indexed his interviews 
with at least 1,023 keywords, each of which was used on average nine times. His 
indexing was sparse and relatively inconsistent, and many of the words he anno- 
tated referred to people or location names. With his indexing system, expressions 
of subjective time, emotional intensity, and uncertainty are not visible, while con- 
cepts that were central to his research interests are much more evident (for example 
“TAT test,” a psychological test he developed, or “vermin” and “dead-handlers,” 
concepts he may have tracked because of his interest in trauma). 

As the purpose and scale of a project change, so does the format of indexing. 
For example, the USC Shoah Foundation Archive, which started a large-scale tes- 
timony videorecording project in 1994, adopted a two-tiered system with more 
specific terms and broader parent terms. This allows for the systematic annota- 
tion of numerous interviews with a wide range of speakers, languages, contents, 
and time of recording. If we look at Boder's corpus through the lens of the Shoah 
Foundation’s indexing system,? what we see is yet another picture. Figure 1 com- 
pares the categories we may use to describe Boder's collection with Shoah Foun- 
dation keywords and N-gram-based categories. 


19 "Dort haben sie ermordet Mittwoch in der Nacht und Donnerstag Nacht, es war der Erste 
des Monats Tammuz." Boder corpus, interview with Ephraim Gutman, September 12, 1946 (tran- 
script n. 37, line n. 15, translated). 

20 See Drucker's discussion on the need for overlapping and non-homogeneous models of time 
mapping in data-based humanities work in: Johanna Drucker, Visualization and Interpretation: 
Humanistic Approaches to Display (Cambridge, MA: MIT Press, 2020), 114. 

21 The digitization and categorization of typewritten keywords from Boder's notes was carried 
out by Lizhou Fan (UCLA Holocaust Studies DH Lab work, 2019). 

22 To compare N-gram-based keywords with Boder's, I use a version of Boder's keywords 
grouped under the USC Shoah Foundation's controlled vocabulary of parent terms (fewer in 
number and broader in meaning than Boder's heterogeneous and sporadic keywords). Lizhou 
Fan and Todd Presner matched Shoah Foundation parent keywords to Boder's terms with the 
aim to make the two indexing systems comparable (UCLA Holocaust Studies DH Lab work, 2019). 
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Each indexing system highlights different aspects of the interview collection. 
While the keyword-based one is helpful to track major thematic categories like 
*discrimination responses" or *mistreatment and death," the N-gram-based one 
offers an insight into more structural categories of the survivors' testimony, such 
as their uncertainty, reticence, point of view, and emphasis on specific episodes. 


4 Interview Analysis 


Among all his interviews, Boder dedicated particular attention to his encounter 
with Abraham Kimmelmann, an 18-year-old survivor he spoke to in 1946. His 
interview is exceptional on all accounts: it is personal, philosophical at times, 
elaborate, and detailed. While most interviews range between 20 minutes and 
an hour, Kimmelmann spoke to Boder for almost 4 hours (and Boder planned to 
interview him again, although he never managed to). In his testimony, Kimmel- 
mann makes several personal observations of moral and philosophical character 
about the events he experienced, which is unique given Boder's habit of insisting 
his interlocutors only focus on precise facts and events they witnessed personally. 
This is also one of the few interviews that Boder processed for publication in his 
Topical Autobiographies? and annotated with his indexing system. The unusual 
features of this interview, together with the amount of work Boder dedicated to 
it, make it a useful example to compare the keyword-based indexing system with 
the N-gram-based one. 

For the analysis of Kimmelmann’s interview, the N-gram system was improved 
with some consideration for the exact frequency of each N-gram and the similar- 
ity between comparable expressions. Only Kimmelmann’s lines were considered, 
while Boder's questions and comments were left aside. After filtering out seman- 
tically poor segments mostly consisting of conjunctions, pronouns, and auxiliary 
verbs, two drawbacks of raw N-grams needed to be addressed for an accurate fre- 
quency count: frequency underestimation in some cases and overestimation in 
others. On the one hand, the presence of slightly different versions of the same 
N-gram (such as “ein jüdischer Miliz” vs. “die jüdische Miliz”) leads to the risk 
that a really frequent expression might get lost among many variants that are not 
counted together just because they are not identical word by word. On the other 
hand, the overlap of N-grams of different sizes obtained from the same sentence 
distorts the final count and makes rare expressions appear more frequent than 
they really are. For instance, we might be led to believe that the town of Bensburg 


23 Boder, Topical Autobiographies. 
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is of incredible importance in Kimmelmann’s testimony because it appears in 212 
N-grams, while in fact he only mentions the town 25 times. Most N-gram instances 
of the word are snippets of the same sentence: for example, for the sentence “Essen 
war billiger als in Bensburg" (“food was cheaper than in Bensburg”),”* we may 
obtain N-grams “war billiger als in Bensburg," “billiger als in Bensburg," “als in 
Bensburg,” etc. 

To improve the reliability of the frequency count, similar N-grams (such as 
“jüdische Gemeinde" and “bei der jüdischen Kultusgemeinde," both designating 
Jewish religious communities) were grouped together using a similarity algo- 
rithm.” Any two N-grams were considered passably similar if they fulfilled these 
conditions: their similarity score was > 0.75; they appeared in different lines of 
the interview; or if they appeared in the same line, they were not subsets of each 
other (as “und meine Mutter" compared to “ich und meine Mutter”). The result- 
ing list was manually filtered to exclude expressions that were clearly unrelated 
despite matching these criteria on a formal level, like “meine Mutter” and “ein 
Meter." 

The remaining N-grams were manually analyzed and grouped to identify 
frequent units that potentially indicate the main themes or expressions of the 
interview. A comparison between N-gram-based results and Boder's own annota- 
tions of this interview clearly shows the benefits of adding an alternative content 
indexing system (Figures 2 and 3). 

Boder annotated each page of this interview with nouns he found particu- 
larly significant, like objects, people, concepts abstracted from Kimmelmann's 
narration. A look at the terms he recorded (grouped under the Shoah Foundation 
project's parent keywords) reveals his focus on factual details, like personal and 
geographical names, and on traumatic categories of mistreatment or life under 
military occupation. This choice is sensible when one thinks of the aims Boder 
was trying to pursue: a quantification of survivor trauma and a historical report of 
the events. However, these terms do not fully capture what the survivor, Kimmel- 
mann, is talking about. A look at the results of N-gram-based indexing done on 
the same interview shows that several significant aspects of the interview escape 
a predetermined keyword-based approach. 

N-gram-based keywords return a completely different image of what the inter- 
view is “about.” First, we see a strong narrative component to Kimmelmann's 
testimony. Unlike other survivors, he is not just answering questions, but build- 


24 Boder corpus, interview with Abraham Kimmelmann, August 27, 1946 (transcript n. 56, line 
n. 277). 

25 SequenceMatcher algorithm of Python's difflib package. A 0.75 threshold was set after a few 
tries and may be improved. 
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Figure 2: Boder's indexing of Kimmelmann's interview. 

Source: Full interactive diagram on Tableau Public (Anna Bonazzi). Boder's indexing of 
Kimmelmann's interview: accessed June 30, 2021, /web/20210630014531/https://public. 
tableau.com/app/profile/anna5558/viz/KimmelmannkeywordsBodervsN-grams/Dashboard1. 
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Figure 3: N-gram-based indexing of Kimmelmann's interview. 

Source: Full interactive diagram on Tableau Public (Anna Bonazzi). N-gram-based indexing 

of Kimmelmann's interview: accessed June 30, 2021, /web/20210630014531/https://public. 
tableau.com/app/profile/anna5558/viz/KimmelmannkeywordsBodervsN-grams/Dashboard1. 
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ing a long narration that draws the listener into his experience (as we can easily 
confirm with a close reading of the interview). He often invites Boder to see things 
from his perspective, repeating expressions such as “imagine this," “picture this,” 
*you can't imagine how ...." While defining this category does require subjective 
judgment, N-grams help us notice Kimmelmann’s recurring turns of phrase. 

Another interesting category of this visualization is Kimmelmann’s interest 
in the agency and moral involvement of other Jews in the events he experienced. 
He talks about Jewish militias, he talks about the ambiguous role of the Jewish 
Council, and he is really concerned with the morally gray area that many Jews 
found themselves in when they had to decide what they were willing to do to save 
their lives, or what they felt enabled to do under the circumstances. Boder does 
not index this category (he indexes separate references to people and organiza- 
tions but does not view this as a topic on its own). While a topic like this might 
go unseen in manual indexing given its borderline taboo character, the frequen- 
cy-based approach makes it easier to notice it, name it, and index it (close reading 
confirms this is in fact a key part of this testimony). 

A third category that stands out in this interview is the “I don't know / I 
didn't know" cluster. The strong presence of this category even in an interview 
that is mostly self-regulated and contains comparatively few direct questions by 
the interviewer confirms the narrative role of uncertainty and reticence in early 
testimonies, as already observed in the corpus-wide analysis from the previous 
section. These three examples represent central aspects of Kimmelmann’s tes- 
timony, but they don't easily stand out in a manual keyword-based analysis like 
that of Boder. 


5 Conclusion 


This chapter discussed an alternative method to index the content of large inter- 
view collections based on N-grams (as opposed to manually assigned keywords), 
particularly in the case of genocide survivors' testimonies. This method helps us 
expand our understanding of what “content” is and what parts of a testimony 
should be annotated in addition to verbal, mostly noun-based concepts. This 
chapter showed that N-gram-based indexing helps identify categories like narra- 
tion, uncertainty, reticence, emphasis, emotional insistence on time references, 
and topics we may not be predisposed to manually index, like morally gray areas. 
This method strongly emphasizes frequency as a marker of the importance of 
an expression, an approach that is not free from drawbacks: it is important to 
remember that frequency can be misleading, and that the frequency of an expres- 
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sionis not the only way to determine what is important in an interview, especially 
one that deals with difficult or even traumatic personal memories. Yet, with its 
focus on recurring sentence units, N-gram-based indexing helps us center the 
survivors' words and expression over our expectation of a certain narrative and 
expand indexing to include structural, non-verbal aspects of their testimony. 

This method touches upon questions at the heart of the digital turn for Jewish 
Studies, and the humanities more broadly: how can we use digital research 
methods in an ethical way when we approach the study of Holocaust interviews, 
and more generally the testimonies of genocide survivors? What pitfalls should 
we avoid to ensure our computational approaches do not reduce survivors and 
their life stories to points or numbers in a spreadsheet? As this chapter has tried to 
show, a thoughtful use of computational research methods in fact has the poten- 
tial to humanize its objects of study and bring them closer to the spectator in 
ways that are sometimes not accessible to manual analysis in the case of large 
archives. By combining qualitative analysis with computational text analysis, 
N-gram-based indexing offers the spectator a richer and more nuanced access 
to survivor's memories, and by highlighting the relevance of interview features 
like emotion and dialogical interaction, it points to multiple potential directions 
for computational research on testimonies also beyond Holocaust and genocide 
studies. 
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Mapping Forced Academic Migration 


Abstract: In spring and early summer 1933, after the “Law for the Restoration of 
the Professional Civil Service" was passed on April 7, some 3.000 academics, the 
largest part of them Jewish, lost their positions in German universities and other 
research institutions. Despite the strict immigration laws, which were a result 
of World War I, over 5096 of the dismissed academics (the numbers vary greatly 
among the different studies) left Germany and tried to continue or rebuild their 
academic careers in exile. The largest part migrated to the UK and the US, but also 
France, Sweden, Turkey, the British mandate of Palestine as well as Switzerland 
were target countries. 

We are digitally mapping the geographical movement of academics triggered 
by mass expulsions from Germany, Austria, and other countries occupied by the 
Third Reich and shedding light on inner-institutional changes in respect to both 
academic staff and the academic standing of research institutions. In our contri- 
bution we shed light on the case of the mathematicians at Góttingen University 
in the early years of the *Third Reich" and the impact of National Socialist uni- 
versity politics on the field of mathematics. These are initial results of our larger 
study which help to identify patterns and ruptures of forced academic migration. 


Keywords: forced academic migration, migration studies, exile, national socialism, 
World War II, university history 


1 Introduction 


The German philosopher Theodor W. Adorno (1903-1969), who, as a “non-Aryan”, 
was forced into exile by the National Socialists, wrote in his book Minima Moralia, 
*Every intellectual in emigration is, without exception, mutilated, and does well 
to acknowledge it to himself, if he wishes to avoid being cruelly apprised of it 
behind the tightly-closed doors of his self-esteem."* He was speaking of his own 


1 Theodor W. Adorno, Minima Moralia: Reflections from Damaged Life (London: Verso, 2005), 33. 


Note: The authors' names are listed in alphabetical order; all the authors contributed equally. 
The authors would like to thank Aleksandra Petrović and Sebastian Borkowski for their support 
with the database and data visualization. 


8 Open Access. © 2022 Sinja Clavadetscher et al., published by De Gruyter. JBA] This work is 
licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
https://doi.org/10.1515/9783110744828-006 
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biographical disruptions, the expropriation of language,? and political isolation 
and career breaks. But this experience was not his alone. Over 3,000 German aca- 
demics, most of them Jewish either by self-identification or the racial laws of Nazi 
Germany, were dismissed from their positions, based on paragraph 3 of the Law 
for the Restoration of the Professional Civil Service. After April 7, 1933, Jews, other 
“non-Aryans”, and political opponents (under paragraph 4) could no longer serve 
as teachers, professors, and judges or in other governmental positions. Around 
50% of dismissed scientists emigrated and tried to continue their careers abroad. 
This was likely the largest migration of academics in human history. The emi- 
grees faced, as we have seen with Adorno, alienation, racism, impediments, and 
often, psychological difficulties in exile, in both their private and their profes- 
sional lives. However, precisely because emigration emerges from a crisis, it can 
bring about innovations and result in scientific and social achievements which, 
according to the historian Dan Diner, *would have hardly been expected under 
conditions of steadiness and continuity of life plans.”? 

The biographical perspective of the impact of this forced migration during 
the 1930s and 1940s on individual careers is one of three main objectives of our 
research project titled “Science Transnational. Switzerland and the Academic 
Forced Migrants from 1933 to 1950."^ We are equally interested in the way the trans- 
national academic network and the academic landscape as such changed due to 
the expulsion of academics. Therefore, we are digitally mapping the geographical 
movement of academics triggered by mass expulsions from Germany, Austria, and 
other countries occupied by the Third Reich and shedding light on inner-institu- 
tional changes in respect to both academic staff and the academic standing of 
research institutions.? Researchers agree that the mass expulsion of academics, 
some of them Nobel laureates, was an “intellectual decapitation of Germany"? 


2 On this topic, also see: Stephan Braese, "Deutsche Sprache, jüdisches Exil - Optionen von 
‘Identität’ nach 1933," in Exilerfahrung und Konstruktionen von Identität 1933 bis 1945, ed. Hans 
O. Horch, Hanni Mittelmann, and Karin Neuburger (Berlin: De Gruyter, 2013), 7-16. 

3 Dan Diner, “Einleitung,” in Tel Aviver Jahrbuch für deutsche Geschichte 27 (1998), 3 (Translated 
from German by StM). 

4 This project was funded by a five-year PRIMA-grant (No. 179819) from the Swiss National Sci- 
ence Foundation. 

5 While we have included screenshots of the data-visualization throughout this article, on the 
project website, we have prepared corresponding scenarios for more explorative and dynamic 
data analyses, and the website data is updated daily. 

6 Helge Pross, “Die geistige Enthauptung Deutschlands. Verluste durch Emigration,” in Nation- 
alsozialismus und die deutsche Universität. Universitätstage 1966 (Berlin: De Gruyter, 1966), 143. 
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and a “dismantling of German science,”’ particularly the modern disciplines and 
sub-disciplines, such as economics, political science, sociology, atomic physics, 
and biochemistry. In these areas, the dismissal rate was up to 5096 since the new 
research areas were represented to a higher degree by younger and often Jewish 
scholars.? The universities “surrendered without a fight"? to the drastic curtail- 
ment of their autonomy,'? and there was no public opposition to the forced dis- 
missals from either colleagues or other universities. On the contrary, in April 1933, 
the boards of directors of German universities and, a little later, the universities 
expressed their approval of Adolf Hitler and the National Socialist state," and at 
some universities, around 25% of all university teachers had joined the party by 
the summer of 1933." At that time, among the party members, there were rela- 
tively few tenured professors, while the number of assistants, private lecturers, 
and associate professors was comparatively high. There were two reasons for this: 
first, the National Socialist university policy clearly represented the interests of a 
non-established younger generation against the tenured professors (Lehrstühle), 
and second, the mass dismissals improved the previously miserable career oppor- 
tunities of young academics.” 

While German research institutions suffered a (self-inflicted) loss of research- 
ers, knowledge, and academic reputation, other countries had the rare opportu- 
nity to gain some of the best scientists worldwide. Our large-scale data collection 
will allow for a broad transnational evaluation of different states and universities 
with a special focus on the position of Swiss universities in the changing aca- 
demic landscape. There has been a longstanding connection between German 
and Swiss universities with a strong intellectual as well as personal exchange. 
This, as well as a shared and a similar structural design of the university system, 
made Switzerland a likely emigration destination for many expelled scholars. 
Thus far, however, neither scholars of forced academic migration nor scholars 
of Swiss history have paid attention to emigree scholars in Switzerland or the 
dealing of Swiss universities with them. In this article as well as in our larger 


7 Karl Dietrich Bracher, Die deutsche Diktatur: Entstehung, Struktur, Folgen des Nationalsozialis- 
mus, (Frankfurt am Main: Verlag Ullstein, 1980), 294. 

8 Claus-Dieter Krohn et al., eds., Handbuch der deutschsprachigen Emigration 1933-1945 (Darmstadt: 
WBG, 2008), 681-82. 

9 Michael Grüttner, “Die deutschen Universitäten unter dem Hakenkreuz,” in Zwischen Autono- 
mie und Anpassung: Universitäten in den Diktaturen des 20. Jahrhunderts, ed. John Conelly (Pad- 
erborn: Schóningh, 2003), 67. 

10 Grüttner, “Die deutschen Universitäten unter dem Hakenkreuz," 74. 

11 Grüttner, “Die deutschen Universitäten unter dem Hakenkreuz,” 74-76. 

12 Grüttner, “Die deutschen Universitäten unter dem Hakenkreuz,” 73. 

13 Grüttner, “Die deutschen Universitäten unter dem Hakenkreuz,” 76. 
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project, we will be putting Swiss universities and exiled scientists in Switzerland 
on the map of forced academic migration. 

We are working with nodegoat,'* a web-based research environment for the 
humanities, that allows for relational modes of data analysis with spatial and 
chronological forms of contextualization. Our research question-driven data model 
enables us to store all known personal and professional data of the dismissed schol- 
ars, including not only their career steps at academic institutions across the globe 
but also their publications, presentations, and research project collaborations, 
which allows us to illustrate transnational academic migration and shifts in global 
academic networks, shifting scientific centers, and the transfer of knowledge. This 
article contains links to interactive data visualizations; we recommend reading in 
conjunction with the online visualizations.” 

In this article we will shed light on the case of the mathematicians at Góttin- 
gen University in the early years of the *Third Reich" and the impact of National 
Socialist university politics on the field of mathematics. These are initial results 
of our larger study which help to identify patterns and ruptures of forced aca- 
demic migration. At a later stage, it will be part of a comparative study on forced 
academic migration in which we will be able to make comparisons between dif- 
ferent academic fields, answer questions regarding the extent to which age and 
seniority had an influence on successfully continuing a career in exile, as well 
as shed light on the situation of female scientists after their expulsion from their 
positions. 

In the first section, we will digitally trace in some detail the direct results of 
the declaration of the Law for the Restoration of the Professional Civil Service on 
the mathematicians at Góttingen University and focus on the changing academic 
reputation of the Göttingen Institute for Mathematics.' In the second section, 
focusing on Switzerland and based on two examples, we explore the interdepend- 


14 “nodegoat,” lab 1100, accessed February 9, 2021, https://nodegoat.net. 

15 Accessed January 11, 2022, https://forced-academic-migration.net/datapublications/datapub- 
lications.p/293.m/tag/Mathematics. 

16 The history of mathematics in Germany and in Góttingen has been studied extensively. 
See also: David E. Rowe, "Jewish Mathematics at Góttingen in the Era of Felix Klein," Isis 77 
(1986): 427-49; Arnold Dresden, *The Migration of Mathematicians," The American Mathemat- 
ical Monthly 49, no. 7 (1942): 415-29, accessed February 9, 2021, doi:10.2307/2303266; Colin R. 
Fletcher, “Refugee Mathematicians: A German Crisis and a British Response, 1933-1936," Histo- 
ria Mathematica 13 (1986): 13-17; Louise Grinstein and Paul J. Campbell, eds., Women of Mathe- 
matics: A Biographical Sourcebook (New York: Greenwood Press, 1987); Max Pinl and Lux Furt- 
müller, “Mathematicians under Hitler,” The Leo Baeck Institute Yearbook 18, no. 1 (1973): 129-82; 
David E. Rowe, "Klein, Hilbert and the Góttingen Mathematical Tradition," Osiris 2nd ser., 5, no. 
1(1989): 186-213. 
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encies of Swiss universities and/or professors working at Swiss universities within 
thetransnational network of science and show to what extent our digital approach 
also stimulates questions about Swiss appointment policy(s). Swiss universities 
had a long history of appointing German scientists as a means to strengthen their 
academic reputation and therefore the Swiss and German academia were closely 
interlinked. In the course of World War I however, Swiss immigration policies 
became more restrictive. The field of mathematics serves as an example to study 
the changing interlinking of Swiss universities in the transnational network of 
academia before and during the National Socialist period, this also helps us to 
understand, to what extent Swiss universities were able and willing to hire exiled 
mathematicians. 


2 APeriod of Transition? The Góttingen 
Institute for Mathematics in the 1930s 


The Institute for Mathematics at the Georg-August-Universität Göttingen has a 
long history of excellence, reaching back to the 18th century. Along with Berlin, 
Góttingen was one of Germany's, and one could argue the world's, main centers 
for mathematical research. Despite prejudice against Jews before Hitler's rise to 
power, the University of Góttingen hired several Jewish mathematicians, some of 
whom were full professors. In April 1933, when Law for the Restoration of the Pro- 
fessional Civil Service was passed, their positions were in jeopardy. So, in 1934, 
a mere year after the mass-dismissal, David Hilbert (1862-1943), the former head 
of the Góttingen Institute and the *grand old man of German mathematics" was 
asked by Bernhard Rust (1883-1945), the German minister of education, how 
mathematics at Góttingen was, now that it was free from Jewish influence. Hilbert 
allegedly replied: “There is no mathematics in Göttingen anymore." 

In this first section, we explore the question of whether and how this much- 
quoted statement by Hilbert can be verified based on our relational database, 
which not only includes the personal data of academics and their career steps 
linked to institutions but also information on the reasons for their changing posi- 
tions. The question of the Góttingen Institute's global standing after 1933 themat- 
ically overarches this analysis. 


17 Constance Reid, Hilbert: With an appreciation of Hilbert's Mathematical Work by Hermann 
Weyl, 4th ed. (New York: Springer, 1970), 205. 
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Figure 1: Reasons for termination at German universities, 1933-1939. 


Figure 1 shows the various reasons for changes in professional positions at 
German universities between 1933 and the invasion of Poland in 1939." Using pie 
charts in these figures allows an immediate assessment of not only the differ- 
ent reasons for quitting a position during this period but also of the differences 
between universities. For increased clarity, the various reasons for change have 
been grouped thematically as colored pie slices. 


18 Based on our classifications in nodegoat, the reasons for termination were grouped themati- 
cally. This is a brief explanation of the Figure 1 and 2 legends. BBG/RBG: Dismissals on the basis of 
the Law on the Restoration of Professional Civil Service of April 7, 1933 and the Reich Citizenship 
Law of September 15, 1935. National Socialist Policies (General): Cases attributable to National 
Socialist policies in general. Self-Termination: All persons who resigned from their positions. Dis- 
missal Unknown Reason: Reason for dismissal could not yet be determined. Retirement/Emeri- 
tus: retired or emeritus. Change in Position: This category includes terminations due to a change 
of academic position; for example, if one had completed a qualification (PhD, habilitation, etc.). 
Various Reasons: Individual cases that do not belong to any of the above-mentioned categories 
(health reasons, outbreak of war, etc.). Reason Unknown: Cause of termination is unknown. 
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Particularly noticeable are the white areas, which indicate cases where the 
causes are still unknown, and the red slices, which stand for dismissals due to 
the Law for the Restoration of the Professional Civil Service (BBG) in 1933 and 
the Reich Citizenship Law (RBG) enacted in 1935. To increase the readability 
and enable us to classify and compare Góttingen to other German universities, 
the following map (Figure 2)? only focuses on reasons related to the National 
Socialists’ seizure of power.?? From the simplified visualization of the data in 
Figure 2, we can conclude that whereas a considerable number of universities, 
such as Frankfurt am Main or Heidelberg, lost their mathematicians due to the 
new legislation (i.e., the BBG and the RBG), in Góttingen the reasons were more 
varied.” 

From Figure 2, we can indeed confirm the hypothesis that many mathema- 
ticians left Göttingen due to National Socialist policies in general, but not all of 
them left with manifest reference to the BBG or RBG. 

Given its reputation, we have to assume that the departures of most of the 
faculty members and senior researchers at the Göttingen Institute of Mathemat- 
ics had an impact on the academic standing of mathematics at the University of 
Göttingen. One way to trace this potential shift in the significance of the Göttin- 
gen Institute is to display the researchers’ career steps before and after 1933. For 
this purpose, the career paths of the mathematicians were traced on a world map 
and presented in chronological sequence (Figures 3-5). The colored dots indicate 
career positions, meaning places of graduation or scientific appointments. Their 
size corresponds to the frequency of career positions at this location - the larger 
the dot, the more mathematicians' career steps are located at this institution. The 
lines connect the temporal and geographic dimensions, creating an interactive 


19 Interactive version of this figure is the same as Figure 1. By clicking on the different “rea- 
sons" in the legend it is possible to show or hide them: https://forced-academic-migration.net/ 
datapublications/datapublications.p/293.m/17/ mathematicians-reason-for-termination. 

20 Lemercier and Zalc emphasize the importance of simplifying visualizations as much as pos- 
sible to facilitate their readability, which is true for the maps in this article. Claire Lemercier and 
Claire Zalc, Quantitative Methods in the Humanities: An Introduction (Charlottesville: University 
of Virginia Press, 2019), 127-28. However, simplifications also carry the risk of misinterpretation 
or overestimation of the factors presented. It is therefore extremely important to draw the read- 
er's attention to the way in which the presentation has been simplified and what it does or does 
not show. For example, Figure 2 focuses only on certain reasons, and it should not be forgotten 
which other reasons for dismissal also occurred but are not essential for the current research 
interest. 

21 State of work in progress as of January 2021: the focus lies on Góttingen mathematicians. The 
online version of the visualization is continuously evolving. 
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Figure 2: Selected reasons for termination at German universities, 1933-1939. 


map of mathematicians' movements. In the online version of the map the move- 
ments are changing over time and the directions are visible too.” 

Until the end of 1932 (Figure 3), many mathematicians' careers passed through 
Góttingen, represented on the one hand by the relatively large dot at the University 
of Góttingen and on the other hand by the many lines leading to and from Góttingen. 
In 1933, however, the picture changed abruptly (see Figure 4). Although Góttingen 
is still mapped, its position becomes marginal. The lines now tend to move away 
from Góttingen toward Switzerland, the United Kingdom, and, especially, the USA. 
Itis worth noting that after 1933, many mathematicians, as shown in Figure 4, immi- 
grated directly to the USA. Figure 5 confirms this trend also for the years 1935-1945, 
albeit with slight geographical shifts. Rather than leading from Germany directly to 
other states, first and foremost to the USA, the lines also connect dots in Switzerland 
and Great Britain with the USA. 


22 The interactive versions of Figures 3—5, accessed January 11, 2022: https://forced-academic- 
migration.net/datapublications/datapublications.p/293.m/16/geographic-movement-of- 
gttingen-mathematics. 
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Figure 3: Movements of Göttingen mathematicians, 1925-1932. 
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Figure 4: Movements of Göttingen mathematicians, 1933-1934. 


Thus, Figure 4 suggests that the mathematicians' career paths often did 
not end at their first station of exile. Often, the academics had to move several 
times before settling down definitively, which follows from the shown move- 
ments in Figures 3-5. The questions of how many mathematicians managed to 
proceed directly to the USA and why this was the case remain to be answered 
in future research. However, connections between Góttingen and research insti- 
tutions in the USA before 1933, as shown in Figure 3, can provide first insights: 
personal networks played a significant role in the process of (forced) academic 
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Figure 5: Movements of Göttingen mathematicians, 1935-1945. 


migration.? The visualization emphasizes that as well as research institutions in 
the USA, universities in Switzerland seemed to be, at least temporarily, a center 
of attraction for mathematicians. This raises questions about the determining 
pull factors for the mathematicians' immigration decisions, such as immigration 
policies, aid organizations, personal and professional networks, and university 
policies. The relevance of these factors, particularly that of university policies in 
the emigration process of dismissed scientists, in this case, mathematicians, will 
be discussed in the following section based on the example of Switzerland. 


3 Transnational Linkages and Local 
Idiosyncrasies: Mathematics Professors 
at Swiss Universities 


In geographical, linguistic, and cultural terms, Switzerland was an obvious 
and important emigration destination for German-speaking academics. After 
April 7, 1933, Swiss university members received numerous personal visits and 
letters from German scholars (and from Swiss professors employed in Germany), 
who had been dismissed for racial and/or political reasons, hoping to take up 
teaching and research activities in Switzerland. How Swiss universities dealt 


23 The contact existing between Góttingen and the USA, visualized by gray lines between the 
University of Góttingen and various places in the USA, originates from Roland Richardson, who 
was a postdoctoral student with David Hilbert in Góttingen (nodegoat database). 
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with these forced academic migrants is the subject of this section, again focusing 
on the field of mathematics. In the following, we will focus on two examples: first, 
the interconnections of Swiss universities and professors working at Swiss univer- 
sities within the transnational network of academia and, second, the development 
of Swiss appointment policy(s) in regard to exiled scholars. As in the first section, 
our analysis is built upon a nodegoat database, in this case with information about 
academics in Switzerland. 

Figure 6 is a geographical visualization of the career paths of all mathemati- 
cians who held a full professorship at a Swiss university between 1900 and 1950.”* 
Consequently, this map visualizes the (inter)national mobility of this group of 
mathematicians in the first half of the 20th century. The green dots symbolize the 
university cities, and the larger the dot, the more career path moves mathemati- 
cians made through the university. The sequences of the moves are indicated by 
the blue arrows.” 

Due to the focus on Switzerland, the Swiss university cities are recognizable as 
the largest dots, particularly Zurich, since it is home not only to a university but also 
to the Swiss Federal Institute of Technology (ETH). This visualization illustrates the 
close ties between German and Swiss universities over the entire period. Particularly 


[E] University Cities E Career Paths of Mathematicians 


Figure 6: Career paths of full professors in mathematics at Swiss universities, 1900—1950. 


24 All academic positions (and, if known, the place of study) of professors at the universities of 
Basel, Bern, Fribourg, Geneva, Lausanne, Neuchátel, Zurich, and the Swiss Federal Institute of 
Technology in Zurich are included; however, the data are not yet complete. 

25 Link to interactive visualization: Mathematics (full professors) at Swiss Universities: https: // 
forced-academic-migration.net/datapublications/datapublications.p/293.m/18/ mathematics- 
full-professors-at-swiss-unviersitites. 
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striking is the exchange with the important Góttingen Institute of Mathematics. A 
large number of mathematicians who were full professors at a Swiss university had 
completed part of their studies in Góttingen or held a faculty position there. Among 
them was the German mathematician Hermann Weyl (1885-1955). From 1913, he 
was a full professor at the ETH. He then returned to Góttingen in 1930 upon receiving 
a call from his former alma mater. Due to the increasing threat of National Social- 
ism, Hermann Weyl, his wife Helene Weyl-Joseph (1893-1948), a Jewish writer and 
translator, and their children migrated to the USA. Through mediation, by Albert 
Einstein among others, he managed to obtain a position at Princeton University.”° 

The visualization also makes it apparent that there was not only movement to 
the USA, but also back to Europe. One reason for this is temporary research stays, 
such as the one of George Pólya (1887-1985), a mathematician whose parents con- 
verted from Judaism to Catholicism before his birth. A full professor of mathemat- 
ics at the ETH, he went to Princeton University as a Rockefeller Fellow in 1933 and 
then returned to his position in Switzerland the same year. Like other scholars 
working in Switzerland, George Pólya used the connections established during 
his time abroad to finally emigrate to the USA in 1940 in response to the political 
situation in Europe and the widespread fear amongst people of Jewish descent in 
Switzerland that Nazi Germany might invade the country. There, he taught and 
researched at Stanford University until 1953.7” 

In addition to studying these transnational networks by means of geograph- 
ical visualizations, developments in the appointment policy(s) of Swiss universi- 
ties can also be made accessible through chronological visualizations, as Figure 7 
illustrates. It depicts the total number of full professors in mathematics in Swit- 
zerland (vertical blue lines) and their distribution among various universities 
(horizontal colored lines/dots) between 1914 and 1950. Personnel changes in the 
mathematical institutes — appointments as well as departures — can be quickly 


26 Hermann Weyl studied mathematics in Munich and Göttingen. There he also received his 
doctorate and habilitation. From 1913 to 1930 he was a full professor at the ETH, and from 1930 to 
1933 at the University of Góttingen. Preempting his dismissal, Weyl submitted his application for 
discharge in October 1933. He then worked at the Institute for Advanced Study in Princeton until 
1951. Reinhard Siegmund-Schultze, Mathematicians Fleeing from Nazi Germany: Individual Fates 
and Global Impact (Princeton, NJ: Princeton University Press, 2009), 56. 

27 George Pólya was born in Budapest, Hungary, in 1887. He studied and received his doctorate 
in Budapest. From 1914, Pólya was a private lecturer at the ETH in Zurich and a full professor 
from 1928. He then emigrated to the USA, where he first worked at Brown University and then 
Stanford University, including previous research stays in Vienna, Paris, Cambridge, and Prince- 
ton. David Gugerli, Patrick Kupper, and Daniel Speich, Die Zukunftsmaschine: Konjunkturen der 
ETH Zürich 1855-2005 (Zurich: Chronos-Verlag, 2005), 242; Gerald L. Alexanderson, The Random 
Walks of George Pólya (Washington, DC: Mathematical Association of America, 2000). 
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grasped through the visual representation. After a steady increase in the number 
of full professors in mathematics until 1933, years of stagnation and decline fol- 
lowed. From 1944 onwards, a renewed increase can be observed: vacant positions 
were filled and additional positions were created. This breakdown by university 
also allows the disciplinary focus of the universities to be filtered and compared. 
In addition, it is possible to make a subdivision according to gender to show the 
changing positions of women within the professoriate.”® 


1914 1916 1918 1920 1922 1924 1926 1928 1930 1932 1934 1936 1938 1940 1942 1944 1946 1948 1950 
El Career [Date/Location] 

EU Mathematics (Basel, Bern, Zurich) Bl Mathematics (ETH Zurich) 
[3] Mathematics (Geneva, Neuchatel, Lausanne)  [-] Mathematics (Fribourg) 


Figure 7: Total number of full professors in mathematics and their distribution amongst Swiss 
universities, 1914—1950. 


Regarding the question of what appointment policy(s) the Swiss universities 
pursued from 1933 to 1945 and what positions were taken toward academic 


28 Sophie Piccard became the first female full professor in mathematics at a Swiss university 
in 1943. Christine Riedtmann, *Wege von Frauen: Mathematikerinnen in der Schweiz," in Math. 
ch/100: Schweizerische Mathematische Gesellschaft 1910-2010, ed. Bruno Colbois, Christine 
Riedtmann and Viktor Schroeder (Zurich: European Mathematical Society Publishing House, 
2010), 403-21. 
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migrants, the visualization in Figure 7” indicates that there was only one appoint- 
ment of a full professor in the field of mathematics in 1933, that of the Austrian 
mathematician named Anton Huber (1897-1975), who was not an academic 
migrant. Huber had been an associate professor at the University of Freiburg in 
Switzerland since 1928. In 1935, he joined the Nationalsozialistische Deutsche 
Arbeiterpartei (NSDAP)-Switzerland,?? and in 1938, after the Anschluss of Austria, 
he accepted a position at the University of Vienna.” Huber's appointment as a 
full professor was the last of a foreign mathematician until after the end of the 
war, which indicates that none of the mathematicians dismissed in Germany 
since 1933 could continue their scientific activities in Switzerland - at least not 
in the position of full professor. Support was provided, for example, in the form 
of temporary teaching assignments. Such was given to the Jewish-Swiss mathe- 
matician Paul Bernays by the ETH, upon being dismissed from his position at the 
University of Góttingen in 1933. Despite interventions by Weyl on Bernays behalf, 
the ETH did not promote Bernays until after the end of the war. Bernays received 
tenure in 1945 when he was appointed to associate professor. His academic posi- 
tions, however, did not correspond to his scientific importance.?? 

How these examples can be located within the appointment policy(s) of Swiss 
universities or to what extent they indicate a discipline- or university-specific 
appointment practice can only be answered once all the necessary data have been 
recorded. What is already apparent, however, is that the use of digital methods 
not only generates new research questions but also promotes stimulating and con- 
structive exchanges between different research projects. 


29 To obtain a clearer visualization, the universities were divided into groups: 1. German-speak- 
ing universities (Basel, Bern, and Zurich); 2. French-speaking universities (Geneva, Lausanne, 
and Neuchátel); 3. the only Catholic and German-French speaking university, Fribourg; and 4. 
the Federal Institute of Technology in Zurich. The University of St. Gallen, which did not re- 
ceive the right to award doctorates until 1939, was not (yet) involved in these investigations. 
Link to interactive visualization: Distribution of mathematics (full professors) at Swiss univer- 
sities: https://forced-academic-migration.net/datapublications/datapublications.p/293.m/20/ 
distribution-of-mathematics-op-at-swiss-universities. 

30 In 1932, the Swiss National Group of the NSDAP was formed and subordinated into the NSDAP 
Foreign Department in Germany. Its purpose was to integrate Germans living abroad into the 
National Socialist system, but it also served intelligence purposes. The NSDAP was only banned 
in Switzerland shortly before the end of the war. Catherine Arber, “Frontismus und Nationalso- 
zialismus in der Stadt Bern: Viel Lärm, aber wenig Erfolg," Berner Zeitschrift für Geschichte und 
Heimatkunde 65, no. 1 (2003): 7-8. 

31 Roman Pfefferle, Glimpflich entnazifiziert: Die Professorenschaft der Universität Wien von 
1944 in den Nachkriegsjahren (Góttingen: V&R Unipress, 2014), 291. 

32 Future research will show if and how academic migrants gained access to other academic 
positions. Gugerli, Kupper, and Speich, Die Zukunftsmaschine, 240f. 
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4 Conclusion 


Data visualizations, as shown in this article, are invaluable in answering “tradi- 
tional" research questions about the consequences of Nazi politics for German 
academia with respect to emigration, routes of migration, changing reputations 
of research institutions, networks, and transfer of knowledge; they bring together 
all the different aspects of academic forced migration into a larger picture, as 
the example of mathematicians has shown. The forced dismissal of the scien- 
tists had tremendous impact on German institutions, as the Góttingen Institute 
of Mathematics lost its academic standing in the world, on individual academic 
biographies, and on institutions outside Germany. Personal and institutionalized 
networks play a key role in the process of academic migration as well as in exile. 
Yet, data visualization also enables us to detect unexpected events and patterns 
and thus generate new questions in the field of Exilforschung, such as differences 
between academic disciplines, the extent of multiple migrations, or the relevance 
of personal vis-à-vis institutional networks. 

Moreover, digital tools such as nodegoat allow us to study the macro-level 
of academic forced migration as well as its micro-historical aspects, as our data 
model allows us to zoom in on the career paths and personal networks of indi- 
vidual scholars. We have employed a research-question driven digital approach, 
which exemplifies Miriam Rürup's plea, in the concluding roundtable of the 
January 2021 #DHJewish conference, to bring an “analogue perspective” to the 
digital. However, there are limitations to the digital approach. If we want to 
address more biographical issues, such as the experiences of individual exiled 
scholars or personal contingency management strategies, as raised by Adorno, 
we need to also integrate more traditional historical methods and the kind of 
qualitative approach that a human close reading of primary sources offers. 

Migration is a core element of Jewish existence, from ancient history to modern 
days, and is thus a central research interest of academic Jewish Studies. Migration 
is being studied in the context of religious, literary, and cultural studies because 
the individual and collective experiences of migration of the Jewish people have 
found expression in religious and profane texts, in art, and in memoirs. Historical 
research into the topic covers a broad field of questions including political, social, 
and cultural changes as well as questions about gender and age, to mention just 
a few. 

We understand our project on forced academic migration to be part of the 
field of Exilforschung. Exilforschung primarily deals with single and collective 
biographies of German-speaking émigrés, with their personal and professional 
experience in exile as well as their contribution to the cultural, industrial, and 
academic sphere. A tremendous amount of research has been conducted in this 
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field thus far and has helped us to understand the complexities and ambiguities 
of exilic existence. Our data-driven projects add to the field by making intercon- 
nected comparisons possible. So far, the history of exiled scientists is divided in 
studies on aid organizations or selected disciplines, there is research on questions 
like gender, publications on universities, academic associations, and there are 
books dealing with the consequences the expulsion had on German academia. 
Through a digital approach it is now possible to integrate these aspects in a more 
comprehensive picture, and thereby greatly enhance our understanding of the 
phenomenon of Exil. 
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Maja Hultman 


The GIS prism: Beyond the Myth 
of Stockholm's Ostjuden 


Abstract: In this chapter, I argue that the GIS approach holds the potential to 
challenge historiographical master narratives in Jewish urban history. Using 
Stockholm's modern Jewish population as a case study, I propose that the digital, 
quantitative studies associated with GIS can be used as analytical prisms through 
which to explore qualitative sources. In the case of Stockholm's Jewry, this meth- 
odology allows for a re-examination of spatially inscribed tropes, particularly the 
so-called Ostjude. 

I begin the article by describing the largely unchallenged historiographical 
idea that Stockholm's Jewish pre-1939 population was divided into two groups: 
the integrated, Reform, and northern-residing Jews, and the Eastern European, 
poor, orthodox, and southern-residing Jews - the Ostjuden. Introducing the ana- 
lytical possibilities and methodological challenges of the GIS approach, I there- 
after use ArcGIS to digitally map Jewish economic engagement with Stockholm's 
urban topography in relation to members of two synagogues, one Reform and 
one orthodox. The results show that the two religious groups utilized a unified 
geographical integration and created communal connections across religious 
barriers. 

With this new framework in mind, I lastly turn to a newspaper article, written 
by a reformed Jew in 1905, that describes a shabbat service in the orthodox 
synagogue. Textual analysis reveals the author's construction of the spatially 
inscribed stereotypes previously mentioned, in particular the ostracized trope of 
the Ostjude, and their loose ties to the Jewish community's social reality. Thus, 
this chapter shows that the GIS approach is vital for understanding the Swedish 
Jewish community's creation of tropes to sustain inner-communal hierarchies. 


Keywords: digital, settlement patterns, inner-communal relations, urban topog- 
raphy, Swedish Jews 
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1 Introduction 


Exiting a staircase and standing on the threshold into a room on the right-hand 
side, a reporter from the Swedish daily newspaper Daily News encounters the fol- 
lowing scene in the shul of Adass Jisroel on March 5, 1905: 


They read the prayer with fierce waddles and bow with their backs towards the congrega- 
tion. They turn around, and now I see these two shapeless figures with head and a bigger 
part of the body covered by the white-blue striped shawl, and I feel closer to the Orient when 
they slowly, murmuring, waddle to and fro. There is something mystic, something ancient 
about it all, which makes me feel, despite the fact that I should find it all ridiculous, partly 
moved by these ceremonies, which the faithful have kept for centuries despite oppression, 
persecution, ridicule and contempt, ceremonies that might have been performed by my 
own forefathers some millennia ago in Solomon’s own temple. But as I get back to the vesti- 
bule again, I am yet again in the twentieth century. The electric tram turns down the street, 
and unabashedly, I light my cigarette, even though it is Sabbath.’ 


The writer, using the signature “H V-n,” describes a shabbat service at the shul, 
located in a rented room in a former Pietist girls’ orphanage on the slum and 
industrial island of Sódermalm in southern Stockholm. In the article, Hugo Val- 
lentin — the Jewish editor behind the signature? — depicts members of Adass 
Jisroel as fundamentally different from “acclimatized” Jews, who supposedly 
attend the "grand" and "sophisticated" services in the purpose-built Great Syn- 
agogue in the northern, central parts of the Swedish capital. Indeed, according 
to Vallentin, members of Adass Jisroel and the Great Synagogue are socially and 
culturally dichotomized, divided by ethnic background, religious practices, and 
economic status. He inscribes these differences into the city's topography, locat- 
ing integrated, Reform, and modern Jews in the northern district, and Eastern 
European, poor, and orthodox Jews in the southern district. On the shabbat 
morning of March 5, 1905, as Adass Jisroel's members with *dark eyes" and *long 
beards” pray, Hugo Vallentin, Swedish-born and among the Great Synagogue's 
third richest members,’ exits the building, lights a cigarette, and hops onto a tram 
to get back to his home in northern Stockholm,’ trusting that readers will under- 


1 H V-n, “De rättrogna,” Dagens Nyheter, March 5, 1905, 3, at Swedish Royal Library (hencefor- 
ward referred to as SRL). All subsequent quotes from *H V-n" originate from this article. 

2 Register of pseudonyms and signatures, SRL. 

3 Hugo Vallentin was born in 1860 in Gothenburg. In 1909, he paid 120 Swedish kronor for his 
membership to the Mosaic Congregation in Stockholm; see: Register of taxpayers for 1910, SE/ 
RA/730128/01/A1a79, Swedish State Archive (henceforward referred to as SSA). 

4 According to the register of Stockholm's taxation records, available at Stockholm's City Archive 
(henceforward referred to as SCA), Hugo Vallentin lived on Tegnérgatan 37 in 1905 and 1909. 
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stand that he is breaking a shabbat rule, and, therefore, is not a member of the 
Jewish “colony” in southern Stockholm. 

Is Vallentin's depiction of a dichotomized, spatially inscribed Jewish com- 
munity in Stockholm in the beginning of the 20th century correct? The narrative 
has not lost its potency in the last hundred years and has been reproduced and 
cemented in Swedish Jewish historiography and historical memory. Although 
some researchers have noted that Stockholm's small Jewish population of some 
3,000 people never established an Eastern European, Jewish urban district com- 
parable to, for example, Scheunenviertel, the East End, Marais, and Leopoldstadt 
in Berlin, London, Paris, and Vienna respectively, the supposed division between 
integrated, Reform, northern-residing Jews and Eastern European, poor, ortho- 
dox, southern-residing Jews — Stockholm’s version of the Ostjude — persists. In 
this chapter, I show that geographical analysis through a GIS (Geographic Infor- 
mation Systems) approach challenges the established historiographical narrative 
and accentuates a “more chaotic, more contingent, more fluid, more uncertain, 
more ambiguous, more immediate — in other words, more fully human"? experi- 
ence of Jewish life in Stockholm. Facilitating an analysis based on quantitative 
sources, GIS stresses the pivotal role of Stockholm's topography, uniquely defined 
by islands, islets, peninsulas, inlets, bays, streams, and straits, in shaping Jewish 
negotiations on social integration, religious practices, and internal relations. As 
I will argue, a GIS approach is highly useful to debunk spatially inscribed, histo- 
riographical master narratives and highlight the complexity of the Jewish, urban 
experience. 

In order to show the promise of GIS I firstly introduce the historiographical 
narrative of Stockholm's dichotomized Jewish population pre-World War II. The 
synagogues mentioned in Vallentin's article, Adass Jisroel and the Great Syna- 
gogue, receive particular emphasis, since their divergent geographical positions 
and religious affiliations have previously served as a foundational framework for 
understanding Jewish life in the Swedish capital as divided. The potential of the 
GIS approach to challenge historiographical discourses, despite current discus- 
sions on its limitations, is secondly highlighted, and I argue that GIS functions 
as a prism through which qualitative sources, such as Vallentin's article, should 
be evaluated and examined. I also introduce some of the multiple methodolog- 
ical challenges associated with the digital software tool ArcGIS, and how I have 


5 David J. Bodenhamer, *Chasing Bakhtin's Ghost: From Historical GIS to Deep Mapping," in 
The Routledge Companion to Spatial History, ed. Ilan Gregory, Don DeBats, and Don Lafreniere 
(London: Routledge, 2018), 539. 
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approached them in my own study. Having thus explained the historiographical 
setting and the digital methodology, I revisit the Jewish community in Stockholm 
and compare the economic status of members of Adass Jisroel and the communal 
organ the Mosaic Congregation, which mainly supported the Reform community 
belonging to the Great Synagogue. As I will show, the GIS approach efficiently 
determines their economic and geographical similarities, raising the question 
whether they should indeed be regarded as two groups. With this new frame- 
work in mind, I lastly return to Vallentin's article and his, and the Swedish Jewish 
historiography's, narrative of a dichotomized Jewish community, and argue that 
Stockholm's Ostjude was an ostracized trope created as a result of the communi- 
ty's internal hierarchy, shaped by the local, urban fabric. 


2 Stockholm's Jewry: A Narrative of Dichotomy 


Echoing editor Hugo Vallentin's narrative, scholars in Swedish Jewish history have 
ever since the publication of Judarnas historia i Sverige (The History of Jews in 
Sweden) by Jewish historian Hugo Valentin (not to be mistaken for the above-men- 
tioned Hugo Vallentin) in 1924, crudely divided Jewish migration groups moving 
to Sweden from the 18th century until the Holocaust into two groups. The first 
Jew to be allowed to practice Judaism settled in Stockholm in 1775, and relatives, 
business colleagues, and friends subsequently moved from mainly Mecklenburg 
to Sweden. Danish Jews simultaneously extended their businesses into Sweden. 
This first migration group from central and northern Europe is known for aiding 
the development of Sweden's commercial consumption through sugar and textile 
productions, the foundation of banks and modern industries, and involvement 
in the construction of railroads and extraction of ore.° Later generations became 
influential in cultural spheres as well, being artists, writers and intellectuals, 
financial donors to the capital's Concert Hall, constructers of Stockholm's modern 
architecture, publishers of the national literary canon, and owners of department 
stores and famous restaurants.’ During the decades before the emancipation in 


6 See, for example: Fredric Bedoire, Ett judiskt Europa: Kring uppkomsten av en modern arkitek- 
tur, 1830-1930 (Stockholm: Carlsson, 1998), 18-32; Anna Brismark and Pia Lundqvist, *Sidens- 
jalar och socker: Judiska náringsidkares betydelse fór konsumtionsrevolutionen i Sverige," in 
Frán sidensjalar till flyktingmottagning: Judarna i Sverige — en minoritets historia, ed. Lars M. 
Andersson, and Carl Henrik Carlsson (Uppsala: Historiska institutionen, 2013), 17-47; Hugo 
Valentin, Judarna i Sverige (Stockholm: Bonniers, 1964), 85-95. 

7 See, for example: Mia Kuritzén Löwengart, En samhällelig angelägenhet: Framväxten av en 
symfoniorkester och ett konserthus i Stockholm, cirka 1890—1926 (Uppsala: Acta Universitatis 
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1870, many Jews in Stockholm moved from the medieval district on the island of 
Stadsholmen, today called Old Town, into modern developments on the northern 
mainland. 

The purpose-built Great Synagogue was built in the vicinity of this newly 
established commercial and cultural city center. Constructed during the 1860s, 
the new synagogue received an architectural design linked to the emerging 
Reform Judaism, aimed at modernizing Jewish religion in Europe. In contrast to 
its centrallocation in the previous synagogue in Old Town, the bimah was instead 
placed at the eastern end of the ship in the new synagogue, close to the Aron 
Hakodesh, facing an organ and space for a choir at the other end. Its inaugura- 
tion ceremony on September 16, 1870 included sermons and hymns in Swedish, 
the latter accompanied by the organ. Throughout the 19th and 20th centuries, 
the rabbi’s attire was furthermore inspired by Christian priests, “confirmation” 
for girls was added, and many members discarded the practices of kashrut and 
shabbat. The Great Synagogue became a space for modernized religious prac- 
tices, inspired by the Protestant environment in Sweden. 

Thesecond group ofJewish migrants included some 3,000 to 4,000 Jews migrat- 
ing from mainly Grodno, Kovno, Suwalki, Vilna, and Vitebsk to Sweden between 
the 1860s and 1917? With the arrival of Eastern European Jews, Stockholm's Jewish 
population grew from 900 people in 1870, to 1,250 Jews in 1890, and 2,600 Jews 
in 1910? Although many worked as peddlers,'? Eastern European Jews were not a 
static group of poor Jews. Rita Bredefeldt shows that they used both industrial and 
trade sectors to advance economically and integrate socially." Individuals from 
Eastern Europe were also successful in commerce, and some were famous artists, 


Upsaliensis, 2017); Jacqueline Stare, Porträtt: Speglingar av svensk judisk kultur (Stockholm: Fall- 
marks, 1993), 26-52. 

8 These areas had not experienced widespread pogroms but rather crop failures, famine, and 
military conscriptions. In 1890, about a fifth of those who had arrived in the last 30 years had 
used Sweden as a transit nation, continuing their travels towards America. The migration to 
Sweden largely stopped in 1917 with the introduction of passports. See: Carl Henrik Carlsson, 
"Immigrants or Transmigrants? Eastern European Jews in Sweden, 1860-1914," in Points of Pas- 
sage: Jewish Migrants from Eastern Europe in Scandinavia, Germany, and Britain, 1880-1914, ed. 
Tobias Brinkmann (New York: Berghahn, 2013), 55-56; Carl Henrik Carlsson, Medborgarskap och 
diskriminering: Östjudar och andra invandrare i Sverige, 1860-1920 (Uppsala: Acta Universitatis 
Upsaliensis, 2004), 31. 

9 Ingvar Svanberg and Mattias Tydén, Tusen ár av invandring (Stockholm: Dialogos, 1992), 237. 
10 For a description of life as a Jewish peddler in Sweden, see: Jacqueline Stare, ed., Judiska 
gárdfarihandlare i Sverige (Stockholm: Judiska Museet, 1996). 

11 Rita Bredefeldt, Judiskt liv i Stockholm och Norden: Ekonomi, identitet och assimilering, 
1850—1930 (Stockholm: Stockholmia, 2008), 59. 
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such as painter Isaac Grünewald. Reconstructing an ordinary day on Sódermalm 
at the beginning of the 20th century, Mats Franzén notes that although Eastern 
European Jews and Italians were both considered as strangers in the streetscape, 
the former integrated occupationally and culturally faster.? Putting together these 
historical fragments, it is clear that Eastern European Jews arriving in Stockholm 
were generally determined to integrate into the capital's society. 

In response to the construction of the Reform-aligned Great Synagogue, 
Adass Jisroel was created in 1870, and as mentioned above, it was set up in a 
Pietist orphanage on the island of Sódermalm, south of Old Town. Hugo Vallentin 
describes in his article that the bimah was placed at the center of the room rented 
on the first floor, and a women's balcony was constructed. Correspondence from 
Adass Jisroel's leaders throughout the first decades of the 20th century describes 
the place as a “shul,” “synagogue,” *minyan," and "chevra" respectively, and its 
religious orientation is ambiguous, ranging from “traditional” and “orthodox” 
to "conservative." Dedicated male members, however, participated in morning 
and evening prayers, families kept kosher, but also had to navigate the rules of 
shabbat. Some, for example, used the tram to get from their home to shabbat 
services in the synagogue,” although the payment of the tickets was a break of 
shabbat rules. Despite the unclear religious stance of Adass Jisroel, it was notice- 
ably more traditional than the Great Synagogue. 

Research has only scratched the surface of these two groups' relationship 
beyond Vallentin's and Valentin's suggested dichotomy. Carl Henrik Carlsson's 
study on Eastern European Jewish acquisition of Swedish citizenship at the turn 
of the 20th century shows that the first group, as leaders of the Mosaic Congrega- 
tion, more often than not wrote favorable character descriptions to Stockholm's 
Police to assist the process.'^ Studies on Jewish philanthropy in Stockholm sim- 
ilarly show that wealthier Jews aided some of the poorer families arriving from 
Eastern Europe. They built cheap apartments, established a youth leisure center, 
provided clothes, shoes and food for children, and formed a summer youth camp 
on an island in Stockholm’s archipelago.” Anna Besserman argues that “the only 


12 Mats Franzen, Den folkliga staden: Söderkvarter i Stockholm mellan krigen (Lund: Arkiv för- 
lag, 1992), 144-54. 

13 Interview with Henry Blideman (October 21, 2014). 

14 Carlsson, Medborgarskap och diskriminering. 

15 See, for example: Anna Besserman, “‘...Eftersom nu en gang en nadig försyn täckts hosta 
dem upp pá Sveriges gästvänliga stränder’: Mosaiska Fórsamlingen i Stockholm inför den öst- 
judiska invandringen till staden, 1860-1914,” Scandinavian Jewish Studies 5 (1984): 32; Svante 
Hansson, Flykt och óverlevnad: Flyktingverksamhet i Mosaiska fórsamlingen i Stockholm, 1933— 
1950 (Stockholm: Hillelfórlaget, 2004), 62. 
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existent contact space was between the needy and the philanthropists."!ó Other 
scholars and public educators have adopted and circulated this dichotomized 
image between established Jews and Eastern European Jews, inscribing supposed 
religious, ethnic, and economic differences into the urban space. Some echo Val- 
lentin and argue, without much research evidence, that there existed a territorial 
division between the groups, mainly due to Eastern European Jews’ “particular 
Jewish life and institutions, which, together with their foreignness and their rela- 
tively poorer economic circumstances, contrasted with the more established Jews 
of the cities.” The discourse was reconfirmed in scholarly work as late as 2004. 
In late 2021, the City Museum in Stockholm offered guided tours on “Northern 
Jews and Southern Jews," describing the Jewish community as “divided into two 
groups into the late 1930s" in their marketing.” 

On the other hand, Carl Henrik Carlsson notes that "the real correlation 
‘Western Jew’/Reformed Jew respectively ‘Eastern Jew'/orthodox Jew is not nec- 
essary as strong as it seems or is usually emphasized."?? In an article on Jewish 
migrants making Sweden their new home as a consequence of World War I, he 
calls orthodox yet economically affluent Jacob Ettlinger, chairman of Adass Jis- 
roel from the end of the 1910s, a “typically untypical Jew" and an example of 
how the dichotomized identity “templates are not always true."?' Suggesting that 
there were more “typically untypical" Jews, Carlsson was the first to question the 
historical memory of Stockholm's Jews. 

The internal structure of the Mosaic Congregation also connected the major- 
ity of Jews living in Stockholm, whether rich or poor, reform or orthodox. The 
Mosaic Congregation was the only official Jewish institution and up until 1910 
legally responsible for providing information for the taxation of Stockholm's 
Jews to the city's municipality. Its membership was, therefore, the only way for 
a Jewish individual to join a community with societal status in the non-Jewish 
sphere. The membership itself could, however, be too expensive for less wealthy 
Jews, as it was calculated in proportion to one's income, and difficult for migrants 


16 Besserman, “Eftersom nu en gäng,” 29. 

17 Joseph Zitomersky, “The Jewish Population in Sweden, 1780-1980: An Ethno-demographic 
Study," in Judiskt liv i Norden, ed. Gunnar Broberg, Harald Runblom, and Mattias Tydén (Uppsa- 
la: Acta Universitatis Upsaliensis, 1988), 114. 

18 Hansson, Flykt och óverlevnad, 44. 

19 “Norrjudar och sóderjudar," Stadsmuseets kalendarium, accessed September 13, 2021, https: // 
stadsmuseet.stockholm.se/kalendarium/2021/11/16/norrjudar-och-soderjudar-sodermalm/. 

20 Carlsson, Medborgarskap och diskriminering, 34. 

21 Carl Henrik Carlsson, “Judiska invandrare i Sverige under fórsta världskriget. Fyra fallstud- 
ier," in Fórsta várldskriget i svenska arkiv: Ärsbok för Riksarkivet och Landsarkiven 2014, ed. Carl 
Henrik Carlsson (Stockholm: TMG Tabergs, 2014), 168-70. 
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to obtain, since Swedish citizenship was a prerequisite for membership from 1882 
on. Because the Swedish government discriminated against Eastern European 
Jewish naturalization processes, Swedish citizenships were difficult for migrants 
to achieve.? Consequently, the presence and influence of poorer, migrant Jews in 
the Mosaic Congregation were inhibited. 

My previous study on the construction of the Great Synagogue in the 1860s 
relatedly portrays the power relation between the two groups, and how they 
negotiated the future sacred space. As leaders of the Mosaic Congregation, estab- 
lished Jews did not invite Eastern European Jews into the planning process, and 
even ignored and belittled their wishes on the inclusion of a mikveh in the new 
synagogue.? The negotiation, although unbalanced, shows that the two groups 
had more complex roles to play than those of philanthropists and the needy, with 
Eastern European Jews actively striving to influence how the community's shared 
spaces would be shaped. As I will show, the Swedish Ostjude presented in Hugo 
Vallentin's article from 1905 - the poor, orthodox, Eastern European Jew living 
on Sódermalm - which has been reproduced and cemented by both Swedish 
Jewish historiography and Stockholm's heritage industry, does not align with the 
Jewish community's actual topographical engagement with Stockholm. Instead, 
I argue that the use of a GIS approach as framework for understanding the Jewish 
community's relationship to urban topography firstly, helps us to locate spatial 
stereotypes, and secondly, in combination with qualitative sources, suggests the 
importance of said stereotypes for internal structures and communal relations. 


3 GIS: Analytical Possibilities 
and Methodological Challenges 


Although GIS is fundamental for developing my argument, critics have pointed 
out its limitations in exploring human complexity. GIS tools involve the capture, 
plotting, and analysis of geographical data through collected, quantifiable infor- 
mation at a specific time and place. They have, therefore, been accused of pro- 
ducing “static residential spaces,"?^ trapping mobile environments and actors 
within the boundaries of a chosen date and location of mapping, yielding precise, 


22 Carlsson, Medborgarskap och diskriminering. 

23 Maja Hultman, "The Construction of the Great Synagogue in Stockholm: A Space for Jewish 
and Swedish-Christian Dialogues," Arts: Synagogue Art and Architecture 9 (2020): 33. 

24 Don Lafreniere and Jason Gilliland, “All the World's a Stage’: A GIS Framework for Recreating 
Personal Time-Space from Qualitative and Quantitative Sources," Transactions in GIS 19 (2015): 226. 
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organized snapshots, void of lived experience or agency.” Encountering these 
methodological problems, researchers in digital and spatial humanities have 
in recent years further fine-tuned the use of GIS to support digital analyses of 
human ambiguity. Some add physical attributes, relational information, and nar- 
rative sources to the various software programs to analyze social networks in the 
streetscape over time. Others use PGIS (Participatory Geographic Information 
Systems) or deep mapping to involve the public in studying temporal multilayers 
of meanings attached to a small geographical area.” I, however, dispute a quick 
dismissal of GIS and argue that by using it as a prism for examining qualitative 
sources, human complexity can indeed be captured and explored. 

Within Jewish Studies, the GIS approach has proved vital in contradicting 
accepted historical narratives. The assumed predominance of urban Jewries in 
the Byzantine Empire has, for example, been disproved with a GIS-supported 
analysis." Mapping population patterns of Jewish communities in pre-World War 
II Poland, Malgorzata Hanzl finds individual motives for migration and settle- 
ment.”® Other studies have used GIS to explore the shape of Jewish/non-Jewish 
relations and inner-communal relations in urban settings.” Although providing 
a static picture of human life, GIS can clearly be used to debunk historical myths 
and deliver a geographical grid to be filled with human experiences, encounters, 
practices, and meanings. By placing a GIS-informed, quantitative analysis at the 
forefront of this study, the stereotypical trope of the Swedish Ostjuden, found 
in qualitative sources, is revealed as a constructed, reproduced, and cemented 
image, a product of social webs and internal hierarchies. The use of GIS, in tandem 


25 Trevor M. Harris, “From PGIS to Participatory Deep Mapping and Spatial Storytelling: An 
Evolving Trajectory in Community Knowledge Representation in GIS," The Cartographic Journal 
53 (2016): 319-21. 

26 Harris, “From PGIS to Participatory Deep Mapping and Spatial Storytelling." 

27 Gethin Rees, Nicholas de Lange, and Alexander Panayotov, *Mapping the Jewish Commu- 
nities of the Byzantine Empire Using GIS," in Migration and Migrant Identities in the Near East 
from Antiquity to the Middle Ages, ed. Justin Yoo, Andrea Serbini, and Caroline Barron (New York: 
Routledge, 2019), 116. 

28 Malgorzata Hanzl, "Jewish Communities in Pre-war Central Poland as an Example of a 
Self-Organising Society,” in Computational Science and Its Applicants — ICCSSA 2017: 17th Inter- 
national Conference, Trieste, Italy, July 3-6, 2017, Proceedings, Part 3, ed. Osvaldo Gervasi et al. 
(Cham: Springer, 2017), 231. 

29 See, for example: Mary Anne Poutanen and Jason Gilliland, "Mapping Work in Early Twenti- 
eth-Century Montreal: A Rabbi, a Neighbourhood, and a Community," Urban History Review 45 
(2017): 7-24; Máté Rigó, *Ordinary Women and Men: Superintendents and Jews in the Budapest 
Yellow-Star Houses in 1944-1945," Urban History 40 (2013): 71-91. 
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with qualitative sources, sheds light on the ambiguity and complexity of human 
experiences. 

In this chapter, I will use GIS to map Jewish residences in Stockholm in 1909 
and 1935. The geographical distribution of Jewish homes is analyzed in relation 
to economic and religious aspects. The specific geographical locations of Adass 
Jisroel and the Great Synagogue are shown to be part of a complicated and unbal- 
anced web of communal relations, and as such, they stand as an example of 
the religious, ethnic, and economic complexity that existed within Stockholm's 
Jewish community before World War II. As the economic links and spatial resem- 
blances between the Mosaic Congregation and Adass Jisroel become clear, I will 
argue that their members were more similar than previously believed. 

Using a GIS approach is not without practical and methodological challenges, 
however. Choosing to work with the software program ArcGIS, the georeferencing 
of historical maps, placing historical images on top of today's geographical grid, 
was a particularly time-consuming task, and the final result received a rather 
high RMS (Root Mean Square) error number, which potentially offsets the exact 
location of Jewish residences.” Furthermore, the application of ED (Enumera- 
tion District) boundaries, which allow the map to be divided into urban districts, 
and thereby for subsequent comparisons between different districts, proved too 
time-consuming for this project. 

A second challenge was the collection of quantitative data from a variety of 
sources, located in different archives. Data was collected from the Mosaic Congre- 
gation's membership list in 1939, records of “alien faith believers" in local church 
parishes, and local city taxation records from 1935.?! Together, they provide infor- 
mation on residential address, name, gender, marital status, age, occupation, 
birthplace, and, in case of a membership in the Mosaic Congregation, internal 
taxation. To make sure that I included only people who defined themselves as 
Jewish, I firstly collected names from the Mosaic Congregation's membership list 
and those who were not members of the Mosaic Congregation but defined them- 
selves as *Mosaic" in public records related to local church parishes. Although I 
came across other Jewish-sounding names in the taxation list, I did not include 
them, since I had no proof of their Jewish identification. Different name spellings 
in handwritten sources made the manual cross-reference between the list of Jews, 


30 The historical map that was used is: H. Hellberg, and A.E. Páhlman, “1934 ars karta över 
Stockholm med omgivningar (1917-1934)," SE/SSA/Stockholmskartor, SCA. The RMS error num- 
ber is 29,1225. 

31 Mosaic church books at SSA; Christian parish books of Adolf Fredrik, Engelbrekt, Gustav 
Vasa, Jakob, Johannes, Katarina, Matteus, Oscar, Sofia, and S:t Góran at SCA; Stockholm's taxa- 
tion records at SCA. 
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gathered from the membership list and church parishes, and their addresses from 
the city taxation records difficult, time-consuming, and likely incomplete. Still, 
I was able to localize 49.896 (2,488 adults) of Stockholm's approximately 5,000 
Jews — adults and children - in 1935,? which gives me a representative sample of 
the whole Jewish population. 

It was not easy to locate which of the 2,488 Jewish individuals that were 
affiliated to Adass Jisroel. Going through the private archive related to Adass Jis- 
roel's chairman Jacob Ettlinger, I came across an undated document that listed 
members of the shul.? It includes two lists, one typewritten and one handwritten, 
possibly suggesting that the individuals named had variously strong or limited 
affiliations. From its position in the archival folder, as well as some cross-refer- 
enced addresses in city taxation records, it is likely that the lists were compiled 
in the 1940s. Using this document to deduce Adass Jisroel's members, I found the 
addresses of those who lived in Stockholm and were adults in 1935. I am thereby 
able to compare their settlement pattern in relation to that of the Mosaic Congre- 
gation's members. 

The last methodological hurdle was related to the lack of material. Whether 
it is due to Stockholm's historically non-unitary approach towards the admin- 
istration of non-Protestant inhabitants in the beginning of the 20th century, or 
the loss of archival material during a later stage, records of “alien faith believ- 
ers" do not exist for all local parishes. Some parish books were, furthermore, 
unavailable for research in 2018, when I collected data for the project, due to 
Swedish law on public confidentiality.” This resulted in missing data from five 
of Stockholm's 15 geographically defined parishes in 1935.? Many Jewish individ- 
uals were, however, found multiple times in the ten available records, meaning 
that non-members of the Mosaic Congregation frequently moved across parish 
borders. Tests on the robustness of digital, quantitative network analysis using 
pools of data with missing components also show that results are not as affected 
by missing data as previously presumed.?5 Although voices from Jews in poorer 


32 Clemens Maier-Wolthausen gives Stockholm a Jewish population of 7000 individuals in 1939 
in his book Zuflucht im Norden: Die Schwedischen Juden und die Flüchtlinge, 1933-1941 (Göttingen: 
Wallstein Verlag, 2018), 300; and Pontus Rudberg states that 3,063 Jewish refugees had arrived in 
Sweden in 1939 in The Swedish Jews and the Holocaust (London: Routledge, 2017), 111. The Jewish 
population can therefore be deduced to some 5,000 Jews in 1935. 

33 List of possible members of Adass Jisroel, SE/RA/720483/5/1, SSA. 

34 The Swedish law 2009:400 states that archival documents must be older than 70 years to be 
published online. 

35 Theunavailable parishes were: Hógalid, Maria, Hedvig Eleonora, Storkyrkan, and Kungsholmen. 
36 Yann C. Ryan and Sebastian E. Ahnert, “The Measure of the Archive: The Robustness of Net- 
work Analysis in Early Modern Correspondence," Journal of Cultural Analytics 7 (2021): 57-88. 
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circumstances and/or lacking a Swedish citizenship are missing, the missing 
data potentially has only a modest effect on the analytical results. 

These methodological challenges notwithstanding, I use ArcGIS to compare 
and contrast the geographical distribution of members of both the Mosaic Con- 
gregation and Adass Jisroel, as well as determine their connections. The results 
from the analysis of quantitative, geographical data reveal insights that overturn 
scholars’ previous understanding of Jewish traditional life in Stockholm. This 
geographical analysis of a Jewish population is, to my knowledge, the first within 
Jewish Studies that endeavors to use GIS to explore religious minorities and inter- 
nal hierarchies in urban environments. Using class as the analytical entry point 
into the Jewish population's relationship with Stockholm's urban fabric and the 
Jewish community's religious spaces,” the following section will showcase the 
potency of GIS to question historical myths, reveal communal webs, and define 
internal hierarchies. 


4 Revisiting Stockholm's Jewry: Connections 
Across the City 


In the beginning of the 20th century, the city of Stockholm was composed of 14 
islands in Sweden’s third largest lake. Naturally, the inlets, bays, and channels 
divided urban districts from each other, and although bridges were continuously 
built, the topographical divisions exacerbated and cemented socio-economic dif- 
ferences. The islands of Sódermalm in the south and Kungsholmen to the west 
became increasingly associated with industry and slum. In contrast, modern 
Haussmann-inspired urban developments changed the north-eastern parts of the 
city. The topography of Stockholm accentuated socio-economic division and was 
used by Hugo Vallentin, among others, to describe the Ostjuden who attended 
Adass Jisroel, located on Sódermalm, as poor. But who were Adass Jisroel's 
members? Did they differ that much from Hugo Vallentin and the other members 
of the Mosaic Congregation? Were they mainly poor and Eastern European? And 
did they cluster on Sódermalm? 


37 Discussing Max Weber's distinction between class and status, Till van Rahden uses the 
concept of class to research income and actual economic possibilities and limitations among 
Breslau's Jewish community. See: Till van Rahden, Jews and Other Germans: Civil Society, Reli- 
gious Diversity, and Urban Politics in Breslau, 1860—1925 (Madison: University of Wisconsin Press, 
2008), 23. 
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The first analysis (Figure 1) shows the residential distribution of the Mosaic 
Congregation's taxed members in 1935 in relation to their economic capabilities. 
The Mosaic Congregation's membership fee was, as previously mentioned, pro- 
portionally determined by each member's income, and an analysis of the fee 
can therefore reveal the economic dimension of urban relationships. As can be 
observed in the map, residences belonging to members paying 0-100 kronor, 
6896 of the total population, were located across the whole city. Clearly, apart- 
ments for the working-classes existed in all districts, making no urban space 
impenetrable for the lower classes. In this sense, economic standing did not 
automatically lead to topographical segregation. On the other hand, the Mosaic 
Congregation's wealthiest members did not set up homes in any urban district. 
The residences were located mainly in the modern, north-eastern area of Stock- 
holm, but also along waterfront promenades connected to the main water bodies. 
Some also lived on Kungsholmen and Sódermalm, in areas with views of the cap- 
ital's natural features and urban skyline. Consequently, if members of the Mosaic 
Congregation could afford it, they would establish homes in urban districts that 
allowed for an immersion into the modern or uniquely natural environment of 
Stockholm. 


Figure 1: The residential distribution of the Mosaic Congregation's taxed members (in Swedish 
kronor) in 1935. The red circle represents the location of the Great Synagogue. 
Source: H. Hellberg and A.E. Páhlman, 1934, City Archive, Stockholm. 
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A similar pattern is found among Adass Jisroel's members in 1935 (Figure 2). 
Curiously, members of Adass Jisroel were oftentimes also members of the Mosaic 
Congregation. As mentioned earlier, although the Mosaic Congregation was mainly 
run by descendants of the first migration group at the beginning of the 20th 
century, and was decidedly linked to Reform Judaism, its membership was the 
only way for an individual to join a Jewish community with legal, societal status. 
The social status of the membership seemingly encouraged traditional practition- 
ers to join, connecting people to two religious places. Therefore, the links between 
the Mosaic Congregation and traditional practitioners were many. Adass Jisroel's 
wealthiest members also lived in the modern, north-eastern area of Stockholm. 
While members untaxed by the Mosaic Congregation — meaning that they were 
either too poor to afford to pay the membership fee, were not yet Swedish citizens, 
or did not want to be affiliated to either the Reform orientation or the leadership - 
indeed clustered on the southern parts of Sódermalm, the rest of the members were 
scattered all over Stockholm. It would have taken between 30 and 45 minutes for 
some to reach the shul for morning and evening prayers. Indeed, the residential 
distribution of traditional practitioners essentially mirrors the pattern of the larger 
community, showing that neither class nor the location of religious institutions 
were large influences on the membership of the shul. Instead, both Reform and tra- 
ditional Jews adopted the same relationship to the city and settled in more modern 
and developed areas if economic means allowed. 


Figure 2: The residential distribution of Adass Jisroel’s members, taxed or non-taxed 
(in Swedish kronor) by the Mosaic Congregation, in 1935. The red circle represents 
the location of Adass Jisroel. 

Source: H. Hellberg and A.E. Pahlman, 1934, City Archive, Stockholm. 
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Class clearly shaped Jewish life in Stockholm. While not limiting Jews from 
settling wherever they wanted, it was simultaneously used to gain access to 
modern and status-infused areas in the city. The ArcGIS analysis shows that the 
geographical distribution of Jewish residences was not influenced by religious ori- 
entation. Many attending Adass Jisroel in the 1930s were not only living off Sóder- 
malm, but also had comparably comfortable economic means. In other words, 
Adass Jisroel's members were not only poor Jews living on the southern island. 
Was this also the case when Hugo Vallentin visited Adass Jisroel in 1905? No mem- 
bership list for Adass Jisroel exists from the time, but taking Hugo Vallentin's, and 
Swedish historians', argument that the shul's members had no links to the Mosaic 
Congregation, I have also analyzed the residential pattern of Jews not taxed by, 
and therefore without membership in, the Mosaic Congregation in 1909. Again, an 
equal residential distribution across the whole city emerges (Figure 3). Whether 
the untaxed people were poor, did not hold a Swedish citizenship, or did not 
want to be a part of a community synonymous with Reform Judaism, they were 
living in diverse urban districts. In contrast to, for example, Viennese Jews,** Jews 
in Stockholm did not settle or resettle in the vicinity of other Jewish individuals 
with a similar economic status or religious affiliation. Instead, when setting up 


Figure 3: The residential distribution of Jews not taxed by the Mosaic Congregation in 1909. 
Source: Alfred Bentzer, 1909, City Archive, Stockholm. 
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their private homes in Stockholm, Jews aligned to class divisions topographically 
inscribed into the urban landscape. 

Accordingly, there is a stark difference between the image of the Ostjude 
delivered by Hugo Vallentin, and the cemented historiographical narrative of a 
dichotomized population, and the result of the ArcGIS mappings conducted in 
this research. While previous assumptions have crudely inscribed two divergent 
stereotypes into Stockholm's topography, the digital analysis shows that there 
were connections and similarities between traditional Jews and the Reformist 
Mosaic Congregation. Not only had members of both synagogues a similar dis- 
tribution of economic means, but many individuals were also members of both 
communities. Although the synagogues were positioned in two different urban 
districts, their members were both socially interlinked and formed similar rela- 
tionships to the urban landscape. Despite understanding and practicing Judaism 
differently, the members of Adass Jisroel and the Mosaic Congregation thus had 
more in common than has previously been communicated through the image of 
the Ostjude and the idea of a dichotomized Jewish population. 

With this spatial framework in mind - a unified geographical integration 
and communal connections across religious barriers - Hugo Vallentin’s exotified 
description of Adass Jisroel's members gains new meaning. In his article, Vallen- 
tin argues that the general, traditional practitioner exclaims “He is leading us 
towards baptism!” when the Mosaic Congregation's rabbi makes “trivial simplifi- 
cations and swedicizes the [religious] ritual." Many members of Adass Jisroel did, 
however, belong to both religious communities. Similarly, Vallentin describes 
that “many of its members are retailers, with or without a shop. There is also one 
or two artisans, and their numbers are probably growing due to the commend- 
able efforts of the upper class to lead the younger generation onto new paths." 
Although his and earlier scholarly studies' emphasis on the traditional Jew as a 
Jew in need of philanthropic aid, sources show that Adass Jisroel had members of 
similar economic caliber as the Mosaic Congregation. Lastly, Vallentin notes that 
most members of Adass Jisroel lived on Sódermalm, while my spatial analysis has 
determined the obvious similarities of unclustered residential patterns between 
the Mosaic Congregation and Adass Jisroel. 

Vallentin was, clearly, not relaying the social reality of Jewish life in Stockholm, 
but rather a constructed, imaginary image. As portrayed by Steven Aschheim, an 
antipathy flourished among integrated German Jews in the 19th and 20th centu- 
ries towards the traditional, and often also Eastern European, Jew.? The Swedish 


39 Steve E. Aschheim, Brothers and Strangers: The East European Jew in German and German 
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Ostjude was similarly a trope created by established Jews to belittle poorer Jewish 
immigrants and their cultural practices. Although members of Adass Jisroel were 
as economically and spatially integrated as members of the Mosaic Congregation, 
they were associated with the industrial and slum topography of Sódermalm. This 
created a trope that, in tandem with philanthropy and the exclusion from member- 
ship in the Mosaic Congregation, ostracized Eastern European Jews, downplayed 
the importance of traditional rituals, and belittled the agency of poorer Jews. Using a 
GIS approach to explore the topographical context of Vallentin’s article, an underly- 
ing internal hierarchy, determined by the social and religious prejudice of the Mosaic 
Congregation’s bourgeois members, is exposed. The quantitative, digital analysis 
reveals the connections between Stockholm’s Jews across geographical, economic, 
and religious borders. When exploring qualitative sources relaying personal expe- 
riences of Jewish space through this framework, the contemporary condescension 
and ignorance of this Jewish social fabric instead ignites questions of how and why 
stereotypical tropes emerged, and how they influenced people’s relations with the 
Jewish community and the larger city. In other words, the GIS approach exposes the 
complex and ambiguous experience of being Jewish in Stockholm at the beginning 
of the 20th century, and its prism can guide and direct future scholars’ explorations 
of Jewish urban life. 


5 Conclusion 


Starting with Vallentin’s prejudiced depiction of a shabbat service in Adass Jisroel 
in 1905, this chapter has, with the help of ArcGIS, excavated the topographical 
reality of the shul’s members in the beginning of the 20th century. I argue that 
today’s historiographical narrative of divergent Ostjuden and established Jews 
has little bearing in historical results of social connections and unified economic 
behaviors, and is a persisting, monolithic image that does not reflect Adass Jis- 
roel’s community at the time. Using a historically informed GIS approach allows 
us to go beyond the imaginary Ostjude, a product of class-based power structures 
inscribed in Stockholm’s topography. Through quantitative and data-driven, 
digital methods, we can explore the complexity and ambiguity that defined 
Jewish life in Stockholm at the beginning of the 20th century. 

Neatly situated in the recent trend of scholarly work on Jewish local, urban 
communities,*° this research’s emphasis on quantitative data and use of digital 


40 To mention only a few examples from the pool of excellent research: Barbara E. Mann, A 
Place in History: Modernism, Tel Aviv, and the Creation of Jewish Urban Space (Stanford, CA: Stan- 
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methods is relatively rare.** While most urban historians in this field base their 
research on various and multivocal qualitative material, I would argue that top- 
ographical patterns derived from quantitative sources provide vital frameworks 
for contextualizing and understanding such research. My joint use of quantita- 
tive data and digital methodology in performing spatial analysis based on archi- 
val primary sources can hopefully encourage other researchers of Jewish, urban 
communities to dive deeper into the Jewish relationship with urban settings 
across the globe, despite the computational literacy and time-consuming data 
collection needed. What other - or similar — stereotypical tropes can we uncover, 
and what patterns of spatial integration do we find across modern Europe? Is 
there perhaps a Jewish border-crossing, transnational practice of urban integra- 
tion and inner-communal relations that awaits future exploration with GIS? 

Digital humanities indeed hold further possibilities for in-depth explora- 
tions of Jewish inner-communal relations through the topographical lens. While 
the GIS approach reveals internal hierarchies and the use of imagined tropes 
within Stockholm's Jewish community, deep mapping, as applied by scholars 
in spatial humanities, can help to further explore disparate Jewish spatial 
experiences of Stockholm, as well as other cities. Developed to excavate tem- 
poral multilayers of meanings attached to a small geographical area through 
multifarious methodologies and sources, deep mapping is a digital and meth- 
odological entry point into “spatially framed identities." It emphasizes the 
use of non-traditional sources, such as folklore, memories, and art, to reach 
beyond hierarchies of knowledge. It is a methodology that puts multivocality at 
the forefront, and promotes a *nuanced, non-reductionist"^^ view of the world, 
in order to *amplify the voices of marginalized stakeholders, both socially and 
ecologically."^ 


ford University Press, 2006); Natan M. Meir, Kiev, Jewish Metropolis: A History, 1859-1914 (Bloom- 
ington: Indiana University Press, 2010); Devin E. Naar, Jewish Salonica: Between the Ottoman 
Empire and Modern Greece (Stanford, CA: Stanford University Press, 2016). 

41 Van Rahden produces excellent research from his quantitative data, and he also attempted 
to produce a digital, geographical analysis, but lost his work. See: van Rahden, Jews and Other 
Germans. 

42 William Least Heat-Moon, PrairyErth (a Deep Map) (Boston: Mariner Books, 1999); Les Rob- 
erts, *Deep Mapping and Spatial Anthology," Arts 5, no. 1 (2016): 1-8. 

43 David J. Bodenhamer, John Corrigan, and Trevor M. Harris, “Introduction,” in Deep Maps and 
Spatial Narratives, ed. David J. Bodenhamer, John Corrigan, and Trevor M. Harris (Indianapolis: 
Indiana University Press, 2015), 3. 

44 Harris, "From PGIS to Participatory Deep Mapping and Spatial Storytelling," 320. 

45 Selina Springett, “Going Deeper or Flatter: Connecting Deep Mapping, Flat Ontologies and 
the Democratizing of Knowledge," Humanities 4 (2015): 624. 
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Since deep mapping not only includes but centers digital research on minor- 
ities and individuals positioned in societal peripheries, it would be specifically 
useful for uncovering and analyzing the different meanings attached to, for 
example, Adass Jisroel and the Great Synagogue. I have collected a variety of 
qualitative material, such as newspaper articles, architectural designs, pho- 
tographs, poems, and private letters, that reveal different uses, emotions, 
and imaginations of Jewish spaces in Stockholm. The use of a deep mapping 
approach could enable the visualization of multiple, disparate spatial stories 
on top of the quantitative, data-driven analysis of the GIS approach and help to 
find new connections or disconnections between them. Furthermore, the online 
publications of such mappings, for instance with a tool like ArcGIS, enables 
us to invite readers (or users) to join in the academic endeavor, and thereby, 
facilitate open-ended, public-related research.^* Deep mapping can thus not 
only help to further destabilize the current master narrative of a dichotomized 
Jewish population in Stockholm, as in the specific example of this article, but 
more generally highlight the complexity of Jewish urban life and the experi- 
ences of previously marginalized voices. 
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Piergabriele Mancuso 

Archival Research, Virtual Reality, and 3D 
Modeling: Toward a Comprehensive 
Reconstruction of the Ghetto of Florence 


Abstract: The Ghetto of Florence was established by Cosimo I de' Medici, Tuscany 
first Grand Duke, in 1570, a few decades after those of Venice (1516) and Rome 
(1555). Belonging to the Medici, the Florentine ghetto was under the direct control 
of the Medici administrators, whose main task was not only to supervise it physi- 
cally-architecturally but also to ensure its financial and economic feasibility. From 
a documentary-archival standpoint, this led to the production of an extremely rich 
set of papers, hundreds of volumes now stored at the National Archive in Florence, 
consisting of accounting books and financial documents, contracts, maps, and 
cadastral registers, chronologically stretching from the late 16th century to the 
late 19th century when the ghetto - like many Jewish ghettos in post-Unitarian 
Italy - was demolished. Studying and analyzing together all the data that these 
documents provide (especially maps and textual descriptions), it is possible to 
reconstruct the ghetto exactly as it was, as well as it changed during the centuries. 


Keywords: early modern Jewish Italian history, Florentine jewry, ghetto of Florence, 
De' Medici family, virtual reconstruction 


1 Introduction and a Preamble 


Upon completion of a comprehensive survey at the National Archive in Florence of 
all cadastral and accounting papers produce by the Medici bureaucracy between 
the mid-16th and mid-18th centuries and concerning the real estate properties of the 
Medici family, the Eugene Grant Project Jewish History Program (hereafter EGJHP) 
atthe Medici Archive Project discovered more than one hundred bulky volumes con- 
cerning the Ghetto of Florence, probably the richest, most detailed, and chronolog- 
ically most extended documentation about an Italian ghetto. Consisting of various 
and heterogeneous materials, as we will see in more detail herein - including finan- 
cial-accounting papers, blueprints, maps and topographies, and extremely detailed 
textual descriptions of virtually every single segment of the Florentine ghetto — and 
shedding potential light on virtually every aspect of Jewish people in the ghetto, in 
2018 the EGJHP launched the Ghetto Mapping Project (GMP), an ambitious research 
project aiming to reconstruct the architectural features, economic-financial trends, 


3 Open Access. © 2022 Piergabriele Mancuso, published by De Gruyter. [COEIZITSITHM| This work is licensed 
under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
https://doi.org/10.1515/9783110744828-008 
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and demographic trends of the Ghetto of Florence. Due to the heterogeneous char- 
acter of these materials, whose study requires exegetical expertise in different 
fields, and especially given the uninterrupted flows of additional materials found 
in other archival sites (expanding the project chronology so as to reach the late 19th 
century, when the Ghetto of Florence was demolished), the GMP remains a work in 
progress. The aim of this chapter is to offer a succinct but hopefully comprehensive 
historical outline of the Ghetto of Florence, with a focus on its establishment in the 
late 16th century, to describe the archival materials located up until now, ultimately 
defining the state of the art of our research while also outlining very briefly future 
research perspectives.! 


2 Florence 1570: Genesis and Development 
of an Early Modern Ghetto 


Established by Cosimo I de' Medici in 1570, the Ghetto of Florence is one of the 
oldest Italian European ghettos, the third after that of Venice in 1516 and that of 
Rome in 1555. Located in the very heart of the city center, at an almost equal very 
short distance from Santa Maria del Fiore and Brunelleschi's Dome and Palazzo 
Vecchio, the political barycenter of Medici Florence, the ghetto faced Piazza del 
Mercato Vecchio (Old Market Square), one of the most important popular trading 
spots of the old city, yet certainly not a marginal portion of the urban fabric. The 
making of the Ghetto of Florence did not require any major transformation. Con- 
sisting mostly of a group of adjacent old buildings whose foundations dated back 
to the Middle Ages, the ghetto's only really relevant intervention was to turn the 
two main entrances - one facing the Old Market, the other one leading to Via 
dei Succhiellinai (via Roma), another major artery of the city center - into two 
gates that, according to the ghettoization chart of 1570, should have been closed 
at night and supervised by Christian guards. 


1 The provisional results of the ongoing research were presented in P. Mancuso, "The Politics 
of Segregation in Grand Ducal Florence: The Ghetto of Florence, from Material Fall to Virtual 
Rebirth - a Short Presentation," in, Judisches Kuturerbe (re-)praesentieren — Judisches Kultur- 
erbe, ed. Katrina Kessler et al., vol. 2, 45-66 (Braunschweig: Netwerk judisches Kultuerbe c/o 
Beit Tfila - Forschnugsstelle fur judische Architektur, 2019); P. Mancuso and L. Vigotti, “From 
Centuries-Old Squalor: The Ghetto of Florence, from History to Virtual Life. Introduction to the 
Ghetto Mapping Project," Rivista di Letteratura Storiografica Italiana 1 (2017): 123-34; P. Mancuso 
and L. Vigotti, “Reconstructing a Lost Space: The Ghetto Mapping Project at the MAP," Materia 
Giudaica 22 (2017): 221-32. 
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The real problems in the making ofthe ghetto were of a legal and bureaucratic 
nature. First, ghettoizing the Jews, which meant forcibly gathering them from 
all over Tuscany and - in the case of Florence - forcibly relocating families that 
had already developed strong and productive ties with the surrounding Chris- 
tian majority, could be perceived as a major political and judicial-juridical vulnus 
and weakening of the socio-political prerogatives also by the Christian subjects. 
Nothing judicially cogent and serious enough to justify such an action had come 
from the survey of the Jewish population of Tuscany conducted since 1569 by the 
Magistrato Supremo headed by Carlo Pitti, and the only solution was to "forge a 
case," pretending that the vast majority of Jewish money lenders had violated the 
banking charters to such an extent and so seriously as to force the state itself to 
enact physical and juridical separation between Jews and Christians. 

Second, although exerting absolute (today one would say dictatorial) power, 
the grand ducal role was still under the scrutiny of the old governing bodies, espe- 
cially the Senate, as in the case of the 1567 vote to impose badges on Jews. To 
make the ghetto a private Medici property and enable the Medici family to enjoy 
the profits that would be earned in the future, especially in terms of fiscal rev- 
enues (as envisaged by Pitti in his different anti-Jewish plan, although he had 
initially pushed for complete expulsion of the Jews), the area where the ghetto 
would be established first had to be confiscated by the state and on behalf of the 
general interest and only then transferred to the Scrittoio delle Regie Possessioni, 
the official register of the Medici real estate properties. As clearly explained by 
Siegmund, such a property switch took the form of a financial “triangulation”: 
the Christian owners would be fully refunded not with real currency, but with the 
financial interests generated by equivalent sums of money (paid by the state) that 
the owners were forced to deposit into the Florentine Monte di Pietà, at that time 
no longer a charitable institution but one of the Medici's financial banking tools. 
This resulted in a double benefit for the Medici, who “acquired” the ghetto and its 
future fiscal and rent revenues at basically no cost. In addition, the Medici politi- 
cally aligned with the Roman Curia that in the same year (1570) had granted Cosimo 
the grand ducal dignity. The economic-financial liabilities of all this mostly fell on 
the state's coffers (providing funds to turn the Old Market's set of old houses into a 
Jewish ghetto), and obviously on the Jews forced to reside inside the ghetto and pay 
house rents (by law about 3096 higher than in the rest of the city). 

Initially amounting to no more than 500-600 people, the Jewish population 
ofthe Florentine ghetto rapidly grew until, in 1704 under the rule of Cosimo III de' 
Medici (1642-1723), it required the creation of an additional area (Ghetto Nuovo) 
on the northern side of the old ghetto. 

At the end of the 19th century, between 1884 and 1892 and after the process 
of political unification of the Italian Peninsula, the entire historical city center, 
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then including the area of the two ghettos, underwent a general, massive, and 
very debated general urban plan, pompously called the “Risanamento,”” in order 
to change radically the physiognomy of the old city. Since the early 19th century, 
Jews had gradually left the ghetto and, by the end of the century, it had become 
one of the most degraded areas of the city, with the small Jewish houses now 
being occupied by sub-proletariats and criminals of various types. Demolishing 
the area meant getting rid of the shameful past and present (see Figure 1).? 


3 Archival and Extra-Archival Sources 


As with any other Medici private real estate properties, the ghetto was heavily 
supervised, meticulously controlled by the police authorities, and under the 
strict and direct control of the Scrittoio delle Regie Possessioni, whose main aim 
was to ensure the economic profitability of the estate and take careful note of any- 
thing occurring within and directly or indirectly affecting the physical-architec- 
tural and financial conditions of the Medici private properties. The Scrittoio work 
of ghetto supervision stretched from 1588 to 1808, when the French revolutionary 
army and temporary government lifted all restrictions on Jews (which were then 
partially re-implemented with the return of the Lorena dynasty), on the whole 
amounting to 95 volumes that can be divided into five main categories: 

1. *Entrate e uscite del ghetto" (Revenues and Expenditures) 

"Registri di acconcimi" (Registers of Damages and Reparations) 

“Registi di creditori e debitori del Ghetto" (Debits and Credits) 
"Amministrazione del ghetto" (Accounting Ledgers) 

*Descrizioni del Ghetto and Piante" (Textual Descriptions and Maps) 


gpoPBo.wow 


Although the first four categories deal with different aspects of the same topic — 
basically the balance between expenditures and revenues - the fifth category 
holds an independent typological position, consisting mostly of textual descrip- 
tions ofthe entire ghetto, of virtually every single space, from the smallest storage 


2 Literally *the Healing." The term was an effective "flag" that the pro-demolition groups (es- 
pecially the local municipality) used in the ideological contention with the opposite front. For 
more information on the coining of this term, see Maria Sframeli, Il centrodi Firenze restituito 
(Florence: Alberto Bruschi, 1989), especially 15-26. 

3 A well-documented study on the demolition of the Ghetto of Florence is offered by O. Fantozzi 
Micali, La segregazione urbana. Ghetti e quartieri ebraici in Toscana (Florence: Alinea, 1995), 
especially 15-86; see also L. Cerasi, “Fiorentinitä. Percorsi di un'ideologia identitaria fra Otto e 
Novecento," Studi Novecenteschi 28, no. 62 (2001): 311-43. 
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Figure 1: Archivio Storico della Città di Firenze, 1422, cass. 49, ins. A. Riordinamento del 
centro: planimetria generale della zona compresa tra le vie dei Calzaiuoli, dei Cerretani, dei 
Pescioni e Porta Rossa prima delle demolizioni, con toponomastica, rilievo dei piani terreni e 
nuovi allineamenti stradali previsti (1888). Map of the center of Florence drawn in 1888: with 
the exception of only the buildings in black, the entire city center was torn down. 
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room through the shops and private apartments to the synagogues and other 
spaces of Jewish sociability. In 1720, a few years after the completion of Ghetto 
Nuovo, the Scrittoio produced one ofthe most accurate textual descriptions of the 
ghetto (volume 3559, folder 4) of both Ghetto Vecchio and Ghetto Nuovo, contain- 
ing more than 200 detailed maps that were subsequently transcribed and techni- 
cally perfected and gathered in the Piante dello Scrittoio delle Regie Possessioni 
(Maps of the Register of the Royal Possessions), a branch and subcollection of the 
Scrittoio delle Regie Possessioni archival collection. To the best of our knowledge, 
these are richest, most detailed, and most comprehensive set of iconographic 
materials on a Jewish ghetto, certainly as far as Italy is concerned (see Figure 2). 

The main aim of the textual and iconic materials is to provide the Medici 
administration with a coherent, consistent, and comprehensive set of informa- 
tion about the architectural features and material conditions of the ghetto spaces, 
especially after major transformation, from the enlarging of perimetral areas, as 
in the case of the two main synagogues, through the making of additional floors 
to host a growing population, to fires, especially the 1670 fire that destroyed part 
of the Italian synagogues.’ 

Despite the rich amount of information and the preciseness of the technical 
data that one can draw from these sources, these sources provide us with basi- 
cally no information about the accessory elements contained in these spaces 
and nothing that sheds light on everyday life in the ghetto. Since the aim of the 
Scrittoio bureaucrats is to “take a photo," a cold, objective image of the extant 
situation, distinguishing clearly and in an unequivocal way what belongs to the 
Scrittoio and what is instead owned or what has been done by the tenants, these 
descriptions deal overabundantly with extremely specific technical features (from 
the shape and type of wood of the beams, for example, to the type of handles on 
windows and doors, the materials used for the gutters, and the forms, shapes, and 
conditions of the nails and other minor supporting elements subject to erosion 
as well as the type of material used for the internal and external wall plasters, 
but paradoxically not their color), but not with the height of the spaces, unless 
something specific and directly dealing with it (for example, the creation of a mez- 
zanine or the demolition of a partition wall) had indeed occurred. 

Much of the missing information, especially the height of the floors, can be 
obtained with a comparative analysis of the historical buildings that survived 
the Risanamento as well as from a rich set of visual materials such as paintings, 
watercolors, and even photographs produced before the demolition works (see 
Figure 3), which took place around 1888-1889, when the ghetto, no longer an area 


4 See Archivio di Stato di Firenze, Scrittoio delle Regie Possessioni, vol. 6559. 
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Figure 2: Archivio di Stato di Firenze, Piante dello Scrittoio delle Regie Possessioni, vol. 26, c. 
2. The map (produced in 1721) shows the exact location of the Ghetto Vecchio (Old Ghetto), on 
the left side in pink color, and the Ghetto Nuovo (New Ghetto), on the right in orange. 
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Figure 3: Archivio Alinari, Florence. Photo taken during the demolition works of the Old Ghetto. 
The photo shows the western side ofthe Italian synagogue, whose additional area was 
supported by two columns - here clearly visible — set in Corte del Macello (lit. *the Butcher 
Courtyard"). 


of forced Jewish residence, had turned into one of the most degraded segments of 
the city center (from this point of view turning into a *modern ghetto"), inhabited 
by the local sub-proletariat, an architectural labyrinth, and a tangle of increas- 
ingly interconnected spaces (with cellars, hiding places, and underground pas- 
sages and secret paths) where criminals of all kinds, together with prostitutes and 
a plethora of socially marginalized individuals, had settled. 

Although not dealing directly with the ghetto in architectural-physical terms, 
a rich set of articles and pamphlets written mostly in support of the demolition 
of the ghetto, denouncing its filthy standards of life and promoting its physi- 
cal removal seen as a necessary requirement for the socio-political and even 
economic rebirth of the old city, provided the reader with valuable information 
about socio-economic features of the ghetto around the second half of the 19th 
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century? The goal was eventually achieved, and its ideological tenets are per- 
fectly summarized in the top register of the massive arch that dominates Piazza 
della Repubblica, the area where the Old Market was located: L'antico centro 
della città / da secolare squallore / a vita nuova restituito (The ancient center of 
the city, from centuries-old squalor to new life restored).° 


4 Reconstructing a Lost Jewish Space: Sources, 
Methods, and Works in Progress 


The Medici ownership over the ghetto is a unicum, as said, in the history of the 
Italian Jewry and the real reason for such an extraordinary wealth of documenta- 
tion. While “enjoying” the status of a Medici private property (from this point of 
view experiencing equal control as any other component of the patrimony of Flor- 
ence's dynasty), the ghetto was also a place that contained the Jews, a minority 
regulated since the early Middle Ages and the making of the Christian canon laws. 

Reconstructing a lost Jewish space poses several methodological questions, 
including why and for which purpose a reconstruction is done. The textual descrip- 
tions of the Scrittoio, as previously stated, provide us with overabundant pieces of 
information about elements that today one might find irrelevant, or almost totally 
insignificant, while simultaneously withholding critical data. Reconstructing 
the economic and demographic features of the ghetto is a complex but relatively 
straightforward operation whose complexity comes primarily from the exceptional 
quantity of data available. While still gathering, deciphering, and processing Scrit- 
toio accounting and demographic papers, the GMP decided to move forward in 
creating a 3D model of the ghetto primarily based on the 1721 set of maps of the 
Piante dello Scrittoio delle Regie Possessioni collection as well as virtual images of 
reconstruction of the main ghetto synagogues. 


5 Astubborn supporter of the demolition of the ghetto and of the Risanamento urban plan was 
the Florentine journalist Giulio Piccini, also known as Jarro, who gathered his numerous articles 
published in the local newspaper La Nazione in Firenze sotterranea — Appunti — Ricordi — De- 
scrizioni — Bozzetti (Florence: R. Bemporad & Figlio, n.d.; Florence: Le Lettere, 2017). Worth men- 
tioning is Carolina Invernizio's lengthy novel, L'orfana del ghetto (Florence: Adriano Salani Tip. 
Edit., 1887; Milan: Editrice Lucchi, 1975), a tearful story not without strong anti-Semitic nuances. 
6 For more information about the Risanamento and the extent to which this changed the phys- 
iognomy of the old city, see Luciano Artusi and Vincenzo Giannetti, “A vita nuova" — Ricordi e 
vidende della grande operazione urbanistica che distrusse il centro storico di Firenze (Florence: 
Edizioni Lio Terrazzi, 1995). 
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The first phase was to assemble and collate all single maps into a series of 
paper sketches (using the 1888 demolition plan maps - see Figure 1 - to define 
the perimetral borders), each one corresponding to a separate floor (see Figure 4). 

These paper sheets were then photographed and elaborated bi-dimension- 
ally using AutoCAD and Adobe Suite. Once a standard height was determined 
(ca. 2.50 meters, upon comparative analysis of photographic and iconographical 
materials and a study of surviving historical Florentine buildings), these mate- 
rials (see Figures 5 and 6) were elaborated using Rhinoceros and Keyshort pro- 
grams and turned into three-dimensional virtual spaces. 

Although extremely detailed, as previously stated, and very carefully drawn, 
the maps present numerous (obvious and unsurprising) imperfections that 
appeared only when all the single maps were collated and combined into the dif- 
ferent floor charts. Bigger or shorter walls, erroneous correspondences between 
lower and upper walls, and missing entrances were adjusted and corrected. 
Once we drew the bi-dimensional floor maps, we noticed that especially on the 
ground floor, many places around the ghetto perimeter had not been mapped and 
no information was provided. The Scrittoio accounting books labeled this area 
as Ghetto Esterno (External Ghetto): mostly shops on the ground floor but also 
private apartments on the upper floors that in both ghettos were given to Chris- 
tian tenants in order to “hide” the Jewish presence, creating a sort of architectural 
buffer zone all around the areas rented, inhabited, or used by the Jews, which the 
Scrittoio called Ghetto Interno, or the Jewish ghetto proper. Being located mostly 
on the ground floor and being surmounted by spaces of equal size and shape, 
these missing places could easily be reconstructed and then integrated into 
the general map. This allowed us to produce a 3D printed model of the ghetto, 
as it appeared at the time of its maximum expansion in the mid-18th century 
(Figures 7and 8). A slightly more elaborately rendered image of the model was 
then inserted into an aerial view of modern Florence, clearly showing how the 
ghetto had been integrated into the city fabric (Figure 9). 

We applied similar criteria for the reconstruction of two specific places in 
Ghetto Interno — namely, the Italian and Spanish synagogues, which were not 
only the two main places for religious worship and traditional education, but 
also the main sites of several Jewish confraternities (providing the population 
with services and forms of social welfare). Due to the demographic growth, the 
two ghetto synagogues, originally meant to serve a population of about 450—550, 
underwent several changes over the centuries, mostly enlargements with inter- 
ventions that had an immediate and significant impact on the outdoor spaces 
and the general physiognomy of the ghetto. In the case of the Italian synagogue, 
a wide balcony area was added on the western side and supported by two massive 
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Figure 4: Collation of all the individual maps of the first floor of the ghetto taken from Piante 
dello Scrittoio delle Regie Possessioni, vol. 26. The missing maps on the left side are the 
shops and apartments of “Ghetto Esterno" that the Scrittoio authorities had given to Christian 
tenants to conceal the presence of the Jews from the city center. The Medici Archive Project. 


158 —— Piergabriele Mancuso 


PIANO TERRA 


Figure 5: The ground floor of the ghetto with the distinction between Ghetto Interno (in red) 
and Ghetto Esterno (sky-blue), a strip of shops and apartments stretching from the ground 
to the upper floors used and inhabited by Christian tenants. These areas were provided with 
independent external entrances. The Medici Archive Project. 
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Figure 6: Screenshot of AutoCAD image with superimposition of the ground and first floors of 
the ghetto. The Medici Archive ProjectO. 
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Figure 7: Work-in-progress of the 3D printing of the Ghetto of Florence. 
The Medici Archive Project®. 


wooden columns resting in the middle of Corte del Macello (The Butcher Court- 
yard - see Figure 3). 

After careful examination of the maps of the two synagogues produced by the 
Scrittoio officials in 1721, and assuming that the two Florentine synagogues would 
not divert on a formal level from other synagogues of the same time and of the 
same rite and that most of the functional elements - e.g., the Bible-prayer reading 
desk, the position of the benches, the color of the curtains — were in all likelihood 
the same ones found in other places of Jewish worship belonging to the same 
cultural and ritual milieus, the data that the Scrittoio papers do not provide were 
obtained from external sources and then implemented into our virtual model. 

The black-and-white marble floor, quite commonplace in the central-north 
Italian synagogues, as well as the color of the curtains, the shape of the benches, 
and the type of brass chandeliers that the Scrittoio officials in no way mentioned 
in their reports was implemented into the virtual models upon comparative anal- 
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Figure 8: The final 3D printed model ofthe Ghetto of Florence according to maps of 1721. 
The Medici Archive Project®. 


ysis of external sources, mostly the synagogues of Pitigliano (a small hilltop town 
in the south of Tuscany under Medici rule since the early 17th century), the Levan- 
tine, Sephardic, and Italian scole of the Venetian Ghetto," as well as an analysis of 
original pieces of Italian synagogue furniture that had been relocated to various 
Israeli synagogues soon after World War II. 

One of the most significant gaps that we filled thanks to such an analysis are 
the two tabernacles that the Scrittoio documents do not describe at all, only briefly 
mentioning their position in the synagogue space. As underscored in previous 


7 For more information about the Venetian synagogues, see Donatella Calabi, Venezia e il Ghet- 
to. Cinquecento anni del “Recinto degli ebrei" (Venice: Marsilio, 2016), also available in an Eng- 
lish translation: Venice, the Jews and Europe - 1516-2016, ed. D. Calabi (Venice: Marsilio, 2016). 
About Pitigliano, see Luigi Cerroni, Breve storia della comunità ebraica di Pitigliano (with English 
translation, A Short History of Jewish Settlement in Pitigliano) (n.p., n.d.); Roberto G. Salvadori, 
La comunità ebraica di Pitigliano: Dal 16mo al 20mo secolo (Florence: Giuntina, 1991). 
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Figure 9: Aerial view (graphic processing on Google MAP, Imagery O 2022 Maxar 
Technologies, Map Data, O 2022) of modern Florence with a slightly rendered version of the 
virtual image of the ghetto, exactly where it was before demolition. The ghetto not only faced 
the popular Old Market square, but was also very close to the main Christian spiritual sites 
(Santa Maria del Fiore Basilica and the Baptistry, on the left side) and the political barycenter, 
Palazzo Vecchio, the Medici's main building, on the right side. 


studies and documented by many archival papers, before demolition of the ghetto, 
thelate Renaissance tabernacle of the Levantine/Sephardic synagogue was moved 
to via delle Oche, where two "oratori" (one for the Levantine/Sephardic, the other 
for the Italian minhag) had been opened soon after the inauguration of the majes- 
tic “Tempio Maggiore."? 

Around the early 1960s, together with many Italian synagogue artifacts, the 
tabernacle was sent to Israel, becoming part ofthe synagogue of Keren be-Yavneh, 
an Orthodox yeshivah. The tabernacle was photographed and a virtual rendition 
of it inserted into our virtual model. Much more complex was the identification 
of the tabernacle of the Italian synagogue that, according to the Scrittoio, was a 
lavish and beautifully decorated Baroque structure, done mostly with stucco and 
presumably also marmorino (fake marble done with painted wood) — two mate- 
rials that cannot be reassembled once removed from their original site. Before 


8 Micali, La segregazione, 74-75. 
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Figure 10: Archivio della Comunitä 
Ebraica di Firenze. Photo ofthe 
tabernacle of the Italian synagogue 
shortly before the demolition ofthe 
ghetto. This item has not yet been 
cataloged. 


the demolition of the Italian synagogue, the tabernacle was photographed, and 
copies of the daguerreotype were auctioned for charitable purposes. We found 
one copy in the Archive of the Jewish Community of Florence where the taber- 
nacle is clearly visible (see Figure 10).? 

A tentative reproduction of the structure was then inserted into the virtual 
model. Our main goal for the time being was to determine the tabernacle's size 
and its position within the ritual space, not its complex aesthetic features, which 
we hope to reproduce more faithfully and philologically in the near future. An 
attempt to reconstruct the internal space of the two synagogues was done a couple 
of years ago, and a new version, implementing further extra-archival data and - 
most importantly - including a much more faithful reproduction of the tabernacle, 
will soon be released. 


9 Dora Liscia Bemporad, “La scuola italiana e la scuola levantina nel ghetto di Firenze: Prima 
ricostruzione," Rivista d'Arte - Studi documentari per la storia delle arti in Toscana 8, no. 2 (1987): 
3-48, especially 19-26. See also S.E. Glicksberg, “Minhage bet ha-knesset shek kehillat kodesh 
italki be-‘ir Firenze" (The Customs of the Italian Jewish Community of Florence), Alei Sefer: Stud- 
ies in Bibliography and in the History of the Printed and the Digital Hebrew Book 24-25 (2015): 
253-83 [Hebrew]. 
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Figure 11: Virtual reconstruction of the Italian synagogue. The Medici Archive Project. 


5 Conclusions 


What lies beneath the rise and fall of the Ghetto of Florence is a complex set of 
historical events. It was the only ghetto in Italy owned by the ruling dynasty. 
Cosimo I de' Medici made a considerable chunk of the city center his own through 
a complex political and financial operation, basically depriving the Christian 
owners of their properties and turning a potentially very expensive political-ad- 
ministrative action in a profitable financial business - at least for his own family. 
In the mid-18th century, the ghetto, as both a place and a system of social segre- 
gation, started showing evident signs of weakness: since the 1680s, more than 
100 Jewish families, amounting to approximately 400 people, had moved outside 
the ghetto, breaking a number of laws, but clearly with the silent assent of local 
authorities and probably with the sympathetic connivance of several *normal" 
people perfectly aware that their Jewish neighbors should have been living within 
the ghetto. 

The demolition of the ghetto did not take place with the specific purpose of 
cleaning up the city and removing the remnants of a shameful past, but as a com- 
ponent of a much wider urban, socio-political, and economic plan to renew the 
entire city of Florence, a city whose physiognomy was complex, heterogeneous, 
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Figure 12: Virtual reconstruction of the Levantine synagogue. The Medici Archive Project. 


deeply grounded in its medieval and premodern times prior to the Risanamento. 
By the end of the 19th century, the Ghetto of Florence had already long turned 
into a sub-proletarian, criminal area, a labyrinth of houses and internal passages 
that time was turning into a possibly more and more intricate social anthill. The 
Risanamento did not change only the external physiognomy of Florence, but also 
its social fabric, providing many with better housing and improving their overall 
living standards. The Jews of Florence felt such a hiatus even more dramatically, 
leaving the ghetto not only materially/physically but also metaphorically, aban- 
doning the small and dark traditional scole (synagogues) for a magnificent, glori- 
ous, and perhaps bit too big of a "Tempio," a grandiose building clearly inspired 
by Christian art (the basilica shape is absolutely evident), the only one with 
a dome able to compete with — but certainly not win against — Brunelleschi's 
dome. The reconstruction of the ghetto means not only bringing to (virtual) life a 
place that has been abandoned, but also filling a huge chronological and phys- 
ical gap, shedding light on what is unknown or wrongly assumed. One of our 
most recent discoveries, for example, was the map of the lavish apartment of 
Salomon Levi. Departing from the stereotypical view of the ghetto house as a 
poor, narrow, and overcrowded space, Levi's apartment appears to be a luxuri- 
ous space, decorated with paintings and stuccos, a colorful upper-middle-class 
home and probably the same that Ricciardo Meacci represented in a watercolor 
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one of the most influential inhabitants of the Ghetto Nuovo. The house consisted of a central 


29 (1721). The map of the luxurious apartment of Salomon Levi, a prominent textile trader and 
hall (A) and several rooms (B), two bathrooms, and a terrace. 


Figure 13: Archivio di Stato di Firenze, Piante dello Scrittoio delle Regie Possessioni, vol. 26, c. 
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Figure 14: Archivio Storico del Comune di Firenze, Ricciardo Meacci, doc. n. 420736. Although 
the ghetto was commonly described and thought to be a decadent and unhealthy area, some 
of its apartments were unexpectedly beautiful and decorated with stuccos and paintings. 
Meacci's watercolor closely reminds us of what the local historian Guido Carocci wrote in his 
report of the historical city center, Il Ghetto di Firenze e i suoi ricordi — Illustrazione storica 
(Florence: Galletti e Cocci, 1886), 31: *Per rovescio di medaglia, in questo ceppo di case posto 
fra Via della Nave e l'Arcivescovado erano quartieri eleganti e decorati con lusso non comune. 
Tuttora si veggono difatti sale adorne di buone pitture e fra le altre é degna di considerazione 
un'ampia sala da ballo con orchestra e colle pareti adorne di ricche cartelle dove sono dipinti 
fatti del vecchio testamento" (on the contrary, in this group of houses set in between Via della 
Nave and the Archbishopric site, there were lavish apartments, decorated with stuccos and 
paintings of various sizes. Still today one can see rooms with nice paintings, among which 
worth mentioning is a dancing hall with an orchestra, with adorned walls where scenes of the 
Bible are represented). 


in 1886. Bringing back to life Levi's apartment is one of several potential recon- 
structing tasks inside the ghetto and one of the many tiny ghetto details that we 
aim to know more about and understand better in a consistent and trustworthy 
reconstructed virtual form. 
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Daniel Stein Kokin 


Introducing “Kol ha-Nekudot”/“All the 
Points” /“Kull al-Nugat”: Interactive, Online 
Mapping of the Israeli-Palestinian Region 
(1840-Present) 


Abstract: This article introduces “Kol ha-Nekudot”/“All the Points”/“Kull al-Nugat,” 
a digital humanities project and pedagogical resource currently under develop- 
ment. Drawing its name from the classical Zionist tendency to refer to communi- 
ties as “points on the map,” “All the Points” is an interactive series of online maps 
depicting the Jewish, Palestinian Arab, and other communities that have existed 
in the Israeli-Palestinian region in any given year from 1840 to the present. “All the 
Points" provides basic information about each community and, by using colors 
and shapes, visually distinguishes among the kibbutz, moshav, Arab village, and 
many other community types and sub-types. Specifically in this piece, I explain 
various features of the enterprise, e.g. the respective rationales for its time range, 
geographical purview, and classificatory scheme; the ways in which the user can 
interact with the maps by adjusting the time bar, selecting the base map, and fil- 
tering in and out various settlement types; the manner in which data is obtained; 
and how various challenges posed by “All the Points" have been addressed. In 
addition, I discuss the project's specialized, curated maps, focusing in particular 
on three thereof, namely those depicting the “Wall and Tower" communities of 
the late 1930s, the modern history of settlement on the Golan Heights beginning 
in 1878, and settlements of the imagination - i.e., fictional communities that exist 
only on the page, screen, or stage. Finally, accompanying images provide a visual 
impression of “All the Points" and its maps. 


Keywords: Israel, mapping, Palestine, settlement, Zionism 
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1 Getting to the Point(s) 


In this article, I introduce and describe “All the Points," an ongoing digital hu- 
manities project developing online, interactive maps that explore settlement 
in the Israeli-Palestinian region from 1840 to the present. Produced with the 
Geographic Information System mapping software ArcGIS,! “All the Points" 
emerged out of the course "Settlement in Israeli History" that I had the privi- 
lege to create and teach as a visiting professor at UCLA. This class considered 
the history of Zionism and the State of Israel from the perspective of the diverse 
range of communities founded under their aegis and/or affected by them - i.e., 
the moshava, the kibbutz, the moshav, Tel Aviv and other cities, the Develop- 
ment Town, the Arab town and village, and - of course - the hitnahaluyot or 
"settlements" of current controversy. 

The title of this class was deliberately provocative, but for good reason, 
namely that it is impossible fully to understand the history of Zionism and Israel 
without considering the approximately one thousand new communities that were 
founded (and the somewhat more than half as many that were dislodged?) in their 
wake during the past one and a half centuries. Nor would one want to, since these 
settlements, taken together, constitute a vast and perhaps unprecedented labora- 
tory of social experimentation. No other place in the world, so far as I am aware, 
features such an astounding array and complexity of ideologically, ethnically, 
historically, and religiously inflected communities. I wanted my students, most 
of whom had no prior background on Israeli or Palestinian history, to be able 
to visualize the rapid changes that have transpired and the diversity in the built 
environment to which they have given rise. Since most maps are both frozen in a 
particular historical moment and blind as to community type, I resolved to create 
an alternative that would address these lacunae, namely by showcasing change 
over time, and using colors and shapes to distinguish amongst different kinds of 
communities. Thus was born “All the Points." 

In what follows, I shall discuss the basic contours of the project - its title, 
coverage (both historical and geographical), and component parts - while also 
addressing the challenges it faces. Alongside the text, select images offer a visual 
impression of “All the Points," and of the main maps and curated maps that com- 
prise it. See Figure 1 for the project logo. 


1 ArcGIS is a product of the Environmental Systems Research Institute (ESRI). Upon consulta- 
tion with digital humanities experts at UCLA, I determined that the capabilities offered by this 
software best suited the aims and needs of “All the Points,” at least as this initial stage. 

2 This includes communities in traditional Palestine proper as well as on the Golan Heights. See 
below for further discussion. 
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All the Points 


Figure 1: "All the Points" Logo, Meira Stein Kokin. 


2 The Point(s) of “All the Points” 


The name of the project was inspired by, and is in dialogue with, the classic Zionist 
trope of ‘od nekudah ‘al ha-mapah (“another point on the map") - i.e., the notion 
that each new settlement marks progress along the path of forging or strengthen- 
ing the rebuilt Jewish national home or state? In essence, this rhetoric indicates 
that the Zionist, and later Israeli, project was about creating, and subsequently 
completing (to the extent possible), a map. Likely the most famous instance thereof 
concerns the night of October 5-6, 1946, in the course of which 11 new settlements 
were founded in the northern Negev. These communities were collectively known 
as Ahad 'asar ha-nekudot (“The Eleven Points") and played a crucial role in the 
ultimate incorporation of this region into what became the State of Israel.^ 


3 WhileIam not aware of any extant work of scholarship that specifically explores the emergence, 
development, and larger cultural significance of this rhetoric of points across the history of Zion- 
ism and Israel, its existence and prominence is adequately attested by the title of the following 
study: Osnat Shiran, Nekudot ‘oz: mediniyut ha-hityashvut be-zikah le-ye'adim politiyim u-vithoni- 
yim be-terem medinah uve-reshitah (Points of Strength: Settlement Policy in Relation to Pre-State 
and Early-State Political and Security Goals) (Tel Aviv: Department of Defense, 1998). 

4 On the 11 points, see Sarit Okon, ed., Ahad 'asar ha-nekudot: sipuro shel mivtsa' no'az (The 
Eleven Points: The Story of a Daring Operation) (Alumim: Be'erot ba-Negev, merkaz moreshet, 
hadrakhah ve-eruah, 2017). 
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While the celebration of “points on the map” was part and parcel of the Zionist 
consensus in the early decades of Israeli statehood, by the late 20th century widely 
contrasting approaches to this discourse had emerged. On the one hand, in the 
choice of the name Nekudah. (“Point.”) for the influential monthly magazine that 
was closely affiliated with, and published during the heyday of, the national-re- 
ligious settler movement,” we discern especially close identification with — and 
indeed intensification of — the trope, now applied in particular to the territories 
conquered in the Six-Day War. This is reflected visually in the grammatically super- 
fluous period following the publication's name, lest the reader miss the point. 

On the other, however, in an important article by the prominent Israeli archi- 
tect and author Sharon Rotbard, we encounter a starkly contrasting perspective, 
namely that the tendency to refer to communities as points “hints at the fact that 
the ‘point’ on the map was more important than the ‘settlement’ itself." Rotbard 
here criticizes the replication of what he calls “the ‘settlement point’ system ... in 
national master plans throughout Israel's history” and appears to be suggesting 
of Israeli settlement that — to evoke and subvert a famous English saying - “it’s 
easy to lose the points for their map." Along the same lines, in the 1995 song 
Ma'aleh Avak about a fictional city on Israel's desert periphery, and its lonely and 
forlorn populace, the famed Israeli rock band Teapacks satirizes the rhetoric and 
practice of Israeli settlement. The lyrics commence as follows: 


“It doesn't make a good impression," thought the leading figures in the government, “there 
are empty patches on the map and down there another point is still missing." So the big 
shots issued an order: "Let's build a city here and we'll also bring some people, who will fill 
up the new houses with their lives. It's great — lots of points on the map. And, anyway, in the 
papers they promised a big scoop." Thus the senior ministers commanded in a drowsy voice 
and ran off to deal with all kinds of emergency situations. A minor deputy came out all the 
way to bless the new settlement, named Ma'aleh Avak.? 


Standing in a long line of reductiones ad absurdum at once literary and car- 
tographical, this passage can be profitably placed in dialogue with arguably the 


5 Nekudah. appeared between 1980 and 2010. It was the official publication of the Amana settler 
movement of the national-religious Gush Emunim (“Block of the Faithful"). 

6 Sharon Rotbard, *Wall and Tower (Homa Umigdal): The Mold of Israeli Architecture," in A Ci- 
vilian Occupation: The Politics of Israeli Architecture, ed. Rafi Segal and Eyal Weizman (London: 
Verso; Tel Aviv: Babel, 2003), 48. 

7 Rotbard, *Wall and Tower," 49—50. 

8 Ma'aleh Avak, translation mine. The name of the song can be rendered as both "Ascent of 
Dust" and "Gathering Dust," simultaneously reflecting both the lofty ambitions and sordid re- 
ality of the many planned communities on the Israeli periphery. For the complete Hebrew text 
of Ma'aleh Avak, see: https://shironet.mako.co.il/artist?type-lyrics&lang-1&prfid-429&wrk 
id=2205, accessed April 5, 2021. 
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most famous of these texts, Jorge Luis Borges’s 1946 short story “On Exactitude in 
Science”, in which “the Cartographers Guilds struck a Map of the Empire whose 
size was that of the Empire, and which coincided point for point with it,” render- 
ing their map utterly useless.’ For if On Exactitude explores what happens when 
a map succeeds all too well in imitating the reality it wishes to depict, Ma‘aleh 
Avak grapples instead with the deleterious social consequences of the map itself 
effectively becoming that reality, to the neglect of its constituent communities. 

In thus naming my map for its composite points, I attempt to place these 
settlements themselves front and center. And, in referring to “All the Points,” I 
challenge and expand upon the traditional Zionist — and therefore exclusively 
Jewish - connotations of the nekudah ‘al ha-mapah, emphasizing that every kind 
of community counts, and that wherever a new point appears, an old one may 
have existed before. Indeed, as seen above, the complete title of the project fea- 
tures Hebrew and Arabic alongside English, and my ultimate goal is to make its 
maps available in all three of these languages.'? 

Unlike many other online mapping initiatives involving this land, “All the 
Points" is not restricted to any specific geographical portion thereof, nor is it 
invested in any particular ideological or political plan or perspective. Stepping 
back from the questions — as important as they are — of who should or should 
not live or have lived where, it instead asks: who has lived where, when, and in 
what kind of community. My hope is that visitors to the site will be fascinated 
and engaged by what they find there, and that this will in turn spur further and 
deeper exploration, learning, and - ultimately — understanding. 

What, then, does “All the Points” cover? Its time frame, 1840- present, was 
deliberately chosen so as to encompass the modern period of intense Jewish set- 
tlement, European Christian colonization, as well as evolving Arab presence in 
as historically sensible and politically neutral a manner as possible. Thus, 1840, 
the year in which the Ottoman Empire regained political control over Palestine 
from Muhammad Ali's Egypt, is a date of historical significance, but not one that 
arouses passion and disagreement today. 


9 Jorge Luis Borges, *On Exactitude in Science" (originally: *Del rigor en la ciencia"), trans. 
Andrew Hurley, accessed April 5, 2021, https://genius.com/Jorge-luis-borges-on-exactitude-in- 
science-annotated. 

10 The project name in Arabic, “Kull al-Nugat,” is particularly apropos, as the root n.q.t, the pri- 
mary meaning of which is point or dot, can also refer to a location, village, market town, or even 
military outpost. See Hans Wehr, A Dictionary of Modern Written Arabic, ed. J. Milton Cowan. 
3rd ed. (Ithaca, NY: Spoken Language Services, 1976), 993. My thanks to Reuven Firestone for 
*pointing" this out to me. 
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Geographically speaking, the map documents all modern Jewish settlement 
in the wider region throughout this period. For example, it shows the five com- 
munities that briefly existed in the Hauran region of today’s Syrian Arab Repub- 
lic in the 1890s, as well as the settlements founded during the period of Israeli 
control over the Sinai Peninsula between 1967 and 1981. By contrast, non-Jewish 
communities are only covered in the region that remains today the active zone 
of Israeli control and influence - i.e., internationally recognized Israel, plus the 
Golan Heights, West Bank, and Gaza Strip. To leave out Jewish attempts to settle 
the Hauran or Sinai would distort the past on the basis of the present, whereas 
to attempt to include their non-Jewish communities would discount the marginal 
role these regions play in the overall story of “All the Points” and pose serious, 
potentially insurmountable, challenges concerning data collection and accuracy. 


3 Plotting Points: Classifying Communities 


With regard to the classification of settlements, “All the Points” features a two-tiered 
scheme that enables varying degrees of detail depending upon viewer background 
and interest, and also allows for potential expansion in the future. Each commu- 
nity type is assigned a letter in the project database (spreadsheet) corresponding 
to a color on the map - e.g., A (green) denotes Arab communities; K (red) is for 
kibbutzim. In addition, each sub-type is indicated by a number, expressed in turn 
on the map in the form of a shape. Thus, Bedouin Arab communities are indicated 
by A2 (green square), whereas K7 (red Magen David or Star of David) is for kibbut- 
zim affiliated with the Religious Kibbutz Movement. The basic version of the main 
map, intended for beginning students or others with little prior knowledge, dis- 
tinguishes solely among settlement types, with all points appearing as small dots, 
varying only by color. On the advanced or detailed map, by contrast, the points are 
presented in different shapes and colors, reflecting both types and sub-types. Table 
1 represents a chart of the complete classification system as it exists at present, 
while Figures 2, 3, and 4 depict the advanced main map in three different years. 


Table 1: The "All the Points" Classification System. 


1) ARAB COMMUNITIES (GREEN) 
A: Arab Community [CIRCLE] 
A1: Arab City or Village [CIRCLE] 
A2: Bedouin Arab City, Village, or Tribe [SQUARE] 
A3: Majority or Entirely Druze Village [STAR] 
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Table 1 (continued) 


A4: Unrecognized Arab Community [DIAMOND] 


A5: Unrecognized Bedouin Arab Community [TRIANGLE] 


A6: Palestinian Refugee Camp [HEXAGON] 


A7: Arab, Status to be determined [QUESTION MARK] 
2) MA‘ABARAH/TRANSIT CAMP (MAROON) 

B: Ma‘abarah Transit Camp [CIRCLE] 

3) COMMUNITY SETTLEMENTS (PURPLE) 

C: Community Settlement [CIRCLE] 


4) EUROPEAN CHRISTIAN COLONIES, E.G. GERMAN, RUSSIAN, AMERICAN, 
GREEK (PINK) 


E: European Colony [CIRCLE] 


E1: German Templer Colony [SQUARE] 


E2: Non-Templer German Colony [STAR] 
5) KIBBUTZIM (RED) 
K: Kibbutz [CIRCLE] 


K1: United Kibbutz Movement [CIRCLE] 
K2: United Kibbutz [CIRCLE] 


K3: Union of the Collectives and the Kibbutzim [SQUARE] 
K4: Kibbutz of the Land [TRIANGLE] 


K5: Association of Collectives [SQUARE] 
K6: Union of the Kibbutzim [DIAMOND] 


K7: Religious Kibbutz Movement [MAGEN DAVID/JEWISH STAR] 
K8: Other Religious Kibbutz [STAR] 


K9: Kibbutz, Status to be determined [QUESTION MARK] 
6) MOSHAVIM (BLACK) 

M: Moshav [CIRCLE] 

M1: Moshav ‘Ovdim [CIRCLE] 

M2: Moshav Shitufi [SQUARE] 

M3: Moshav Po‘alim [TRIANGLE] 


M4: Moshav, Status to be determined [QUESTION MARK] 
7) NAHAL ARMY/AGRICULTURAL SETTLEMENTS (ORANGE) 


N: Nahal Settlement [CIRCLE] 
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Table 1 (continued) 


8) MOSHAVA FARMING COLONIES (YELLOW) 
O: Moshava [CIRCLE] 
9) JEWISH AND/OR MIXED CITIES, TOWNS, VILLAGES 


U1: Generic, Secular or Religious Jewish Majority City or Town [BLUE CIRCLE] 
: Jewish/Arab Mixed City [HALF BLUE/HALF GREEN CIRCLE] 

U3: Development Town [BLUE SQUARE] 

U4: Ultra-Orthodox/Haredi Jewish Majority City or Town [DARK BLUE CIRCLE] 
10) MISCELLANEOUS COMMUNITIES (GRAY) 

X: Miscellaneous [CIRCLE] 

X1: Circassian Village [SQUARE] 


X2: Jewish: Miscellaneous or Status to be determined [MAGEN DAVID/STAR OF 
DAVID] 


X3: Youth or Educational Village [TRIANGLE] 


p: 


X4: Agricultural School or Farm [DIAMOND] 


X5: Other, Status to be determined [QUESTION MARK] 


There will be numerous options to customize the main map's appearance. 
With regard to the base map underlying the points, viewers will be able to 
choose from among a topographical map; a contemporary map indicating both 
present borders and armistice lines, as well as highways and streets across the 
country (and thus the approximate current physical size of communities); his- 
torical maps, including maps reflecting the external borders and administrative 
divisions of the land under Ottoman, British, Israeli, and Palestinian rule; and 
maps ofthe various initiatives proposed to resolve the Israeli-Palestinian conflict, 
including the Peel Commission Plan (1937), UN Partition Plan (1947), Allon Plan 
(1967), Clinton Parameters (2001), and Trump peace plan (2020), to name but a 
few. In addition, users of the site will be able to decide whether to view every 
form of settlement or instead to filter in or out specific types (on the basic map), 
or types and sub-types (on the advanced map). Concerning the map's timeline, 
they will choose whether to examine a specific year, set a range of years, or move 
forwards or backwards in time across the entirety of the covered period, watching 
as communities emerge, disappear, and change their identities; the speed with 
which the map moves from year to year is also subject to adjustment. Finally, it 
will also be possible to zoom in or out of particular regions, and to click on indi- 
vidual points for further information and links to relevant websites. 
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Figure 2: Main map, Ottoman Palestine (1917), Figure 3: Main map, British Mandatory 
Solomon Vimal. Palestine (1947), Solomon Vimal. 


4 AtWhat Point a Point? 


As indicated above, the all in the project title reflects my desire to represent every 
kind of settlement and to be as comprehensive as possible. But what merits a 
point on the map? Or, stated otherwise, what counts as a settlement? Our general 
rule of thumb is to include any group of people beyond an individual family unit 
that resided together in a specific site for at least the better part of a year and had 
the sense that they constituted a distinct community. It is of no consequence if 
the community quickly moved or disbanded, or whether or not it received official 
recognition or sanction; it existed for a period of time and is therefore worthy of 
inclusion on the map. Thus, aside from private family farms and nomadic tribes, 
all other kinds of settlements are ultimately intended to be present, including 
monasteries, Palestinian refugee camps, the ma'abarot transit camps used to 
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Figure 4: Main map, Israel & the Palestinian Territories 
(2021), Solomon Vimal. 


house immigrants to the new State of Israel in the 1950s, unrecognized Bedouin 
villages in the Negev, and settler outposts on the West Bank. To be sure, because 
many of these kinds of communities expose the limits of state oversight and 
control - and thus also of data collection, it will not always be possible to repre- 
sent them to our satisfaction. To evoke Ma‘aleh Avak, there will likely always be 
points missing from, and incorrectly represented on, the map. 

For instance, a ma'abarah ("transit camp") may have been technically closed 
in, say, 1958, its erstwhile residents offered permanent quarters elsewhere, but 
several dozen families may have continued to reside there nonetheless. Similarly, 
the official evacuation of a settler outpost does not exclude some residents remain- 
ing or managing to return immediately thereafter. Finally, it may be difficult to 
determine if a particular area of Bedouin Arab settlement constitutes one or mul- 
tiple communities. In each case, we will make the best determination possible in 
light ofthe available data. Fortunately, an online map is always subject to revision. 
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With regard to community start dates, we try to get as close as possible to 
the moment in time at which a site was first actually populated or regarded itself 
as a distinct community, regardless of when it was officially founded. As for end 
date, in addition to the problems raised in the previous paragraph, we are often 
faced with the challenge of appropriately representing the historical trajectory of 
an individual community’s relations with its neighbor(s). For example, if two or 
more previously independent settlements are combined into a single municipal- 
ity, should they from that point on be denoted on the map as a single point? Here, 
too, we believe that case-by-case determination is essential. 

Consider, as an example, ‘Ein Ganim, the first moshav po‘alim or worker’s 
moshav, founded in 1907 and formally annexed by neighboring Petah Tikvah in 
1937. At around that time, it ceased to maintain its identity as a distinct commu- 
nity and for this reason does not merit a separate point thereafter. By contrast, 
the Palestinian Arab village of Silwan was fully annexed by the Jerusalem munic- 
ipality in 1952, but retains its unique character to this day and thus still warrants 
a distinct point. Likewise, on the West Bank, the Jewish settlements of Alon and 
Nofei Prat are officially part of nearby Kfar Adumim, but since these three commu- 
nities are all geographically distinct, they are each represented by discrete points. 


5 Procuring Points: Sources and Reliability 


The data underlying “All the Points” are stored on the project spreadsheet (a 
Google "sheet"),! from which ArcGIS draws directly. Much of this information was 
obtained by scraping two large data sets, both of which provided geographical 
coordinates for the communities they contained: the 1945 British Survey of Palestine 
and the 2017 List of Localities produced by the Israel Central Bureau of Statistics 
(CBS). Drawing upon these two sources brought many advantages. For example, 
the 1945 survey included the Palestinian villages destroyed between 1947 and 1949 
or subsequently, whereas the 2017 list comprised all recognized Israeli communi- 
ties (Jewish or Arab) extant in that year, including the West Bank settlements. 
Nonetheless, the joint use of these two data sets also occasioned a number 
of challenges, including duplicates and lacunae. Pre-1945 Jewish and Arab com- 
munities that still existed in 2017 Israel appear in both sources - often with 
quite different spellings, rendering the respective entries difficult to identify 
and combine. And missing are the Jewish communities founded after 1945 that 
by 2017 had ceased to exist — e.g., the Sinai communities evacuated as a result 


11 The spreadsheet exists in multiple copies and is regularly downloaded and backed up. 
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of the 1978 Camp David Accords between Israel and Egypt, the Gaza and West 
Bank settlements uprooted in the 2005 Disengagement, as well as other commu- 
nities that failed, moved, or changed their name or status, etc. Finally, because 
these sources did not classify settlements in the same manner as “All the Points," 
further research was required. Nonetheless, scraping these two sources very 
quickly yielded a rather comprehensive dataset. 

Other communities were added or nuanced based on my own research and 
that of students who have contributed to the project, with Wikipedia serving as 
the primary information source. While by no means the final word, Wikipedia has 
proven to be an excellent starting point: nearly every community present or past 
has its own entry in English, Hebrew, and often also in Arabic, and the informa- 
tion required for “All the Points" is, frankly, quite minimal, and thus typically pro- 
vided: name, date of founding (and, if relevant, disappearance and/or destruc- 
tion), classification, and geographical coordinates. Community and organization 
websites have also been consulted" as has published scholarly research. Viewers 
can obtain this information and its sources by clicking on individual points, and 
cases in which additional clarification is required are always noted. Furthermore, 
as the project is ongoing, the dataset is continually being improved. The goal in 
this initial stage of the project is, in any case, to produce as extensive and detailed 
a template map as possible, as a springboard to sufficient funding for a team of 
researchers. They, in turn, will further nuance, supplement, and — where neces- 
sary — correct the information we have thus far obtained. Precisely because of 
the provisional nature of both the data and the maps, we do not feel comfortable 
making either publicly available just yet.” Parties interested in examining and/or 
contributing to our data and maps are invited to reach out to us privately. 


6 Precision Points: Curated Maps 


Alongside the basic and detailed primary maps introduced above, “All the Points" 
involves the production of more specialized, curated map exhibits. While also 


12 The Palestine Remembered website (https://www.palestineremembered.com) has been 
a helpful source of data for the depopulated Palestinian villages, while the nvnasn section of 
http://www.nahal.co.il/, a website devoted to preserving the memory of fallen Nahal soldiers and 
the overall Nahal heritage, has been used extensively in tracking the dates on which Nahal out- 
posts were founded and either civilianized, as was typically the case, or removed. An acronym for 
Novar halutzi lohem (“Fighting Pioneer Youth”), Nahal refers to a program that traditionally com- 
bined military service with the founding of agricultural settlements, primarily in peripheral areas. 
13 Our goal is formally to launch the open access website of the project in 2023. 
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designed to be interactive, these maps focus on specific, pre-determined regions or 

aspects of settlement history that merit more focused examination and/or require 

more precise chronological scale than is possible with an all-encompassing map. 

Here is a list of the curated maps currently in development: 

1 “Inthe Land of Jesus’: European Christian Colonies in the Holy Land (1867-1948)” 

2) “‘Wall and Tower’: The Emergence and Spread of an Iconic Settlement Type 
(1936-1939)” 

3) “‘From Palestine to Israel’: The Disappearance and Appearance of Arab and 
Jewish Settlements (November 1947-November 1949)” 

4) “Peripheral Points’: “From Nahal Military Outpost to Civilian Settlement 
(1951-2013)” 

5) “Arab Settlement in the Jewish State’: Recognized and Unrecognized Arab 
Communities Founded in Israel since 1948” 

6) “Home on the Hilltop’: The Founding of Galilean ‘Mitzpim’ and West Bank 
Settlements and Outposts (1967-2000)” 

7) “Jerusalems’: Arab, Jewish, and Other Neighborhoods in the Holy City and 
Environs” 

8) “Settled, Unsettled, Resettled’: The Golan Heights (1878-Present)” 

9) “Points That Never Were’: Fictional Communities in Israel/Palestine” 


In what follows, I elaborate upon three of these maps: “Wall and Tower,” “Settled, 
Unsettled, Resettled,” and “Points that Never Were.” 


6.1 “Wall and Tower” 


Alternatively translated as either “Wall and Tower” or “Tower and Stockade,” the 
Hebrew phrase Homa u-Migdal refers to the 52 impromptu Jewish settlements that 
were set up (often overnight) during the Arab Revolt of 1936-1939. Initially, these 
communities comprised, quite literally, a perimeter wall and lookout tower. As 
it became increasingly clear in this period that Mandatory Palestine was headed 
for some form of partition, the founding of these settlements was intended to 
secure and/or expand the Jewish presence in strategic territory. This particular 
enterprise occupies a central place in Israeli collective memory, gave birth to the 
notion of “another point on the map,” and has exerted substantial influence on 
subsequent Israeli architecture and settlement (in terms of both strategy and 
practice alike), particularly on the post-’67 West Bank.” It is thus highly appro- 


14 Rotbard, “Wall and Tower,” 49-50. 
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priate to produce a curated map devoted to the “Wall and Tower” communities. 
This map distinguishes between kibbutzim and moshavim - the two kinds of 
*Wall and Tower" settlement — and, in light of the brief time frame, showcases 
the founding of these communities on a month-to-month basis. See Figure 5 for a 
composite view of the *Wall and Tower" curated map. 


Figure 5: Curated Map, “Wall and Tower" (1936-1939), Solomon Vimal. 


6.2 "Settled, Unsettled, and Resettled" 


The purpose of this curated map is to showcase the modern settlement history of 
the Golan Heights, a history as dramatic as it is poorly known. This map is espe- 
cially timely, as the Trump Administration's recognition in 2019 of Israeli sover- 
eignty over the majority of the strategic plateau (seized from Syria during the 1967 
Six-Day War) subjected this region to an unusual degree of media attention." 


15 In February 2021, new U.S. Secretary of State Anthony Blinken partially walked back this 
decision. 
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The modern settlement history of the Golan commences only in 1878 with 
the founding of the Circassian town of Quneitra and other nearby villages.*® 
Thereafter, the establishment of new communities continued at a varying, but at 
times rapid, rate (up to an average of six or eight new settlements per year), such 
that on the eve of the 1967 war, the Heights were host to 273 communities with a 
combined population of approximately 150,000. While it is unlikely that we will 
acquire sufficient data to map the founding of these communities on an annual 
basis, thanks to the assistance of Israeli Golan Heights scholar Yigal Kipnis, 
we are already able to produce maps showcasing the communities founded in 
the periods 1878-1884, 1885-1913, 1914-1945, and 1945-1967, and to denote the 
predominant ethno-religious group in each settlement, whether Bedouin Arab, 
Circassian, Turkmenian, Druze, Alawite, or Christian Arab.” Two hundred and 
twenty-three of these communities (all villages, with the exception of Quneitra) 
were located in the territory conquered by Israel, and all but five of them were 
abandoned by their inhabitants and ceased to exist.'? Thereafter, a modest Israeli 
settlement drive began, resulting in the founding of the 32 Jewish communities 
extant on the Golan today.” In 1981, Israel also formally applied its law to the 
Golan Heights, effectively annexing the region. Thus, it is anticipated that the 
individual images of “Settled, Unsettled, and Resettled" will depict the Golan pre- 
1878, in 1884, 1913, 1945, May 1967, June 1967, 1981, and 2022. 


16 The contents of this section draw heavily from Yigal Kipnis, The Golan Heights: Political His- 
tory, Settlement and Geography since 1949 (London: Routledge, 2013), 125-48 (Chapter 3, *The 
Settlement Map of the Syrian Golan"). 

17 Bedouin Arabs constituted the majority of the inhabitants, with a significant presence of 
both Circassians and Druze and, to a lesser extent, of Turkmenians and Christians. See Kipnis, 
The Golan Heights, 136, for a breakdown of the population on ethno-religious lines. 

18 In two cases, a few residents remained behind in other settlements. See Kipnis, The Golan 
Heights, 145. 

19 33, if one counts Ramat Trump (“Trump Heights"), officially founded in honor of the former 
American president on June 16, 2019. The first of a planned initial 20 families moved to the new 
community in April 2021. See Guy Varon, "The First Family Has Come to Live in Ramat Trump: 
‘Strange without Neighbors, but it’s an Adventure" [Hebrew], N12, April 7, 2021, accessed November 
29, 2021, https://www.mako.co.il/news-israel/2021_q2/Article-2d78a08db4ba871026.htm; “Com- 
munity Settlement Ramat Trump, Golan" [Hebrew], Atar ha-bayit: atar ha-megorim ba-kfar (The 
Home Site: The Site for Village Residences), undated, accessed November 29, 2021, https://www. 
homee.co.il/27238*0-ni3/?s-17932. 
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6.3 “Points That Never Were” 


Perhaps the most surprising (and certainly the most creative) aspect of a project 
committed to depicting “who has lived where, when, and in what kind of com- 
munity” is “Points That Never Were” (“Nekudot she-lo hayu me-‘olam”). The aim 
of this curated map is to document settlements of the imagination - i.e., commu- 
nities that exist only on the page, on the screen, or onstage. Probably the best- 
known example thereof is Beit ha-Tikvah (*House of Hope"), the desert develop- 
ment town mistakenly visited by the Alexandria Ceremonial Police Orchestra in 
the award-winning musical The Band's Visit (2016—) and, previously, in the simi- 
larly acclaimed 2007 Israeli film of the same name (Bikur ha-Tizmoret, in Hebrew). 

Fantasy settlements have, however, long been a staple of artistic engagement 
with Israel. To offer but a few examples: the renowned Israeli satirist Ephraim 
Kishon set his 1955 novel ‘Ein Kamonim (Cumin Spring), at a fictional moshav 
bearing the same name deep in the Galilee,”° while for his 1965 debut novel 
Makom aher (Elsewhere, Perhaps), Amos Oz did in fact invent an “other place" 
(the literal translation of the Hebrew title), namely Kibbutz Metzudat Ram (“Lofty 
Fortress"). Indeed, the novel's very first page offers a rather precise description 
of its location near the Israel-Syria border north of the Sea of Galilee. Passing 
to North America, readers of Leon Uris's saccharine Exodus or James Michener's 
sweeping The Source may recall that these books go so far as to include maps in 
which their fictional Israeli communities are fixed in precise locations.” 

More recently, Assaf Gavron created the outpost Ma'aleh Harmesh Gimel 
(*Scythe Ascent 3") for Ha-giv'ah (The Hilltop), his 2013 novel about a West Bank 
settlement. And even Theodore Herzl, in his 1902 fantasy novel Altneuland (Old- 
New Land), dreamt into existence — just outside Nazareth - Neudorf or “New 
Village," an idealized, cooperative farming community. Finally, it would be remiss 
not to note the presence on this map of the Ma'aleh Avak mentioned above, here 
placed way in the south, since - as resounds in the song - sham le-mata‘ haserah 
'od nekudah (*down below another point is missing"). See Figure 6 for a compos- 
ite image of *Points That Never Were." 


20 The novel - still set at ‘Ein Kamonim - was republished in 1972 under the title Ha-shu'al be-lul 
ha-tarnegolim (The Fox in the Chicken Coop). A film bearing the revised title was released in 1977 
and a private farm bearing the name “‘Ein Kamonim” was founded in 1979-1980 southwest of 
Safed on Route 85. To be sure, the name is also inspired by the presence nearby of Har Kamon, 
the highest peak in the Lower Galilee. 

21 Leon Uris, Exodus (Garden City, NY: Doubleday, 1958); James A. Michener, The Source: A Novel 
(New York, Fawcett, 1965). 
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Figure 6: Curated Map, "Points That 
Never Were", Solomon Vimal. 


Paradoxically, the sheer number of these fantasy points - there are undoubt- 
edly many more awaiting inclusion on this map — provides perhaps the very best 
evidence for the importance of thinking about the history of Zionism and Israel 
from the perspective of the real points to which they gave rise. And thus, this most 
unusual approach to mapping Israel strikes me as a worthy capstone both for the 
project as a whole and for this exposition thereof. 


7 AFew Final Points 


To be sure, "All the Points" is an ambitious and challenging undertaking, the 
kind of initiative likely to resist any clear point of completion. Indeed, as of this 
writing, it is hard to predict what complications lie ahead and to what degree the 
project will be able to be fully realized as intended. And yet, at the same time, 
the experience thus far of dynamically mapping Israeli and Palestinian commu- 
nities raises the possibility of extending this same approach to other regions and 
periods with contested histories. “All the Points" may thus ultimately come to 
encompass even more points than initially anticipated. 
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Figure 7: Curated map, “Peripheral Points” Figure 8: Curated map, “Home on the 
(here 1951-1977), Solomon Vimal. Hilltop” (1981), Solomon Vimal. 


Even in its current, pilot state, however, “All the Points” represents a valuable 
pedagogical tool and points (pun indeed intended!) to new ways of learning and 
thinking about the modern history of the Israeli-Palestinian region. It is extremely 
powerful to watch the year-by-year slideshow of the main map: onesees Israel emerge 
before one’s eyes and observes how the predominant settlement types evolve over 
time; at the same time, the mass disappearance of Palestinian villages in the course 
of 1948 is also highly visible and makes a vivid impression. In particular, the coding 
by color and shape gives instant expression to this land’s astounding diversity, stim- 
ulating the viewer’s interest and inviting them to explore further. Many lesser-known 
features of the region’s modern settlement history are likewise rendered visible here, 
including the surprisingly high number of communities in which both Arab and 
Jewish populations were present in the 19th and early 20th centuries, and are, once 
again, in the early 21st; the geographical extent of early Zionist colonization efforts; 
Arab communities founded in the State of Israel (i.e., post-1948); and the non-Arab, 
non-Jewish settlements to which this land has also been host. 
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Perhaps the most significant contribution of “All the Points” is that it invites 
us to tell a history of Israel not focused on the major centers or iconic locales, 
but rather from the perspective of the periphery in, i.e. from the “out-of-the-way” 
points that, taken in aggregate, are at least as essential to the story. In this regard, 
what might at first glance appear to be a limitation of “All the Points,” namely 
that its points reflect neither the number of inhabitants nor the geographical 
extent of the communities they represent, actually reveals itself, upon closer con- 
sideration, to be one of its strengths. Indeed, the fact that each point is the exact 
same size encourages the viewer to take serious note of settlements that would 
otherwise easily slip under the radar. 

To be sure, I do not exclude the eventual representation of population and 
area in some manner, perhaps on a decadal, as opposed to annual, basis. In other 
words, “who has lived where, when, and in what kind of community?” may yet 
also encompass “how many have lived in a where of what size?” But, for now, the 
limited focus of “All the Points” redounds nicely to its dual, intertwined purposes, 
namely (1) to showcase the contribution of each point to the ongoing making and 
re-making of the Israeli-Palestinian map, and (2) to celebrate each of these points 
as an experiment in human community worthy of attention in its own right — i.e., 
to forget, for a moment, that map in favor of each and all of its points.” 
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Abstract: La Vara, El Tiempo, and La Boz De Oriente represent three of the major 
historic Ladino newspapers published across the diasporic Sephardic Jewish world 
in the twentieth century. While Sephardic Jewish history and culture have received 
increasing scholarly attention in recent years, the vast corpus of Ladino newspa- 
pers largely remains unmined, and the field continues to be marginal from the per- 
spective of Jewish Studies. In this chapter, I apply computational analysis of the 
visual content to explore the Ladino press at a macroscopic level. Using a machine 
learning model that I developed for my project, Newspaper Navigator, I have con- 
structed a dataset of extracted photographs, illustrations, maps, comics, editorial 
cartoons, and advertisements from over 15,000 digitized pages of Ladino newspa- 
pers. This method represents an emerging approach to digital humanities research 
with periodicals and presents opportunities to facilitate access and research within 
Jewish Studies. 

With this extracted visual content, it is possible to study the transnational 
dynamics shaping Sephardic print culture and the broader Sephardic experience 
at an unprecedented scale. Accordingly, I describe my analysis of this visual 
content using emerging techniques in order to provide insights related to motifs 
and temporal trends. I offer this work as a case study in interdisciplinary research 
in the digital humanities and Jewish Studies. In addition, I offer methodological 
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reflections related to applying emerging computational techniques to Jewish 
Studies. I conclude with a reflection on the ethical considerations of applying 
machine learning techniques to Ladino newspapers and, more generally, to 
Jewish cultural heritage. 


Keywords: Newspaper Navigator, machine learning, artificial intelligence, digi- 
tized newspapers, Sephardic Studies, Ladino, digital humanities, computing cul- 
tural heritage 


La Vara, El Tiempo, and La Boz De Oriente represent three of the major historic 
Ladino newspapers published across the diasporic Sephardic Jewish world in the 
20th century - from New York to Constantinople to Istanbul. While Sephardic 
Jewish history and culture have received increasing scholarly attention in recent 
years, the vast corpus of Ladino newspapers largely remains unmined, and the 
field remains marginal from the perspective of Jewish Studies. A new collabora- 
tion between the Stroum Center for Jewish Studies' Sephardic Studies Program 
and the Paul G. Allen School for Computer Science & Engineering at the Univer- 
sity of Washington seeks to draw on innovative machine learning techniques to 
render the visual content of Ladino newspapers more accessible to scholars and 
students alike and, in so doing, change the trajectory of Sephardic Studies writ 
large. This chapter reports the findings of this research. 

Many Ladino titles have been digitized by the Sephardic Studies Program at 
the University of Washington. These Ladino newspapers contain not just articles 
and editorials but also an abundance of rich visual content that sheds light on 
Sephardic Jewish experiences in modernity. The advertisements appearing within 
Ladino newspapers have received attention from scholars within Sephardic 
Studies,* and analysis thereof has revealed connections between the American 
Ashkenazic and Sephardic communities, as well as the ways in which advertisers' 
attempts to provide remedies speak to “readers’ anxiety about the fragility of life 
under Ottoman rule.”? Indeed, the visual content within newspapers has proven 
to be a capacious source for humanists. Within periodicals studies, scholars have 
utilized the visual content in newspapers to investigate topics as far ranging as the 
evolution of comedic sensibilities within comic strips to hidden editorial practices 


1 Makena Mezistrano, “Why Are These Passover Ads Different from All Other Ads?" Stroum 
Center for Jewish Studies (2021), accessed May 19, 2021, https://jewishstudies.washington.edu/ 
sephardic-studies/why-are-these-passover-ads-different-from-all-other-ads/. 

2 Sarah Abrevaya Stein, Making Jews Modern: The Yiddish and Ladino Press in the Russian and 
Ottoman Empires (Bloomington: Indiana University Press, 2004), 179. 
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embedded within newspaper layout.? This collective body of work is bolstered by 
new methodologies being employed within the digital humanities to extract and 
analyze visual content in historic newspapers.^ 

In this chapter, I scale up this analysis of visual content to explore the Ladino 
press at a macroscopic level. Using a machine learning model that I developed as 
part of my project, Newspaper Navigator, I have constructed a dataset of extracted 
photographs, illustrations, maps, comics, editorial cartoons, and advertisements 
from over 15,000 digitized pages of these Ladino newspapers.’ This approach of 
utilizing a machine learning model to extract visual content represents an emerg- 
ing methodology for digital humanities research with periodicals and presents 
opportunities to facilitate access and research within Jewish Studies. 

With the extracted visual content from the Ladino newspapers, it is possi- 
ble to study the transnational dynamics shaping Sephardic print culture and the 
broader Sephardic experience at an unprecedented scale. Accordingly, I describe 
my results related to analyzing this visual content using emerging visualization 
techniques in order to provide insights related to recurring motifs and temporal 
trends. I offer this work as a case study in interdisciplinary research in the digital 
humanities and Jewish Studies. Throughout the chapter, I offer methodological 
reflections related to applying emerging computational techniques to Jewish 
Studies. I conclude the chapter with a reflection on the ethical considerations 
of applying machine learning and computer vision techniques to these Ladino 
newspapers and, more generally, to Jewish cultural heritage. 


3 Jean Lee Cole, How the Other Half Laughs: The Comic Sensibility in American Culture, 1895-1920 
(Jackson: University Press of Mississippi, 2020); Kevin G. Barnhurst and John Nerone, The Form of 
News: A History (New York: The Guilford Press, 2002). 

4 Andrew Piper, Chad Wellmon, and Mohamed Cheriet, *The Page Image: Towards a Visual History 
of Digital Documents," Book History 23, no. 1 (2020): 365-97, accessed December 10, 2020, https:// 
doi.org/10.1353/bh.2020.0010; Paul Fyfe and Qian Ge, “Image Analytics and the Nineteenth-Centu- 
ry Illustrated Newspaper," Journal of Cultural Analytics, October 25, 2018, 11032, accessed Decem- 
ber 15, 2020, https://doi.org/10.22148/16.026; Melvin Wevers and Thomas Smits, “The Visual Digital 
Turn: Using Neural Networks to Study Historical Images," Digital Scholarship in the Humanities 
35, no. 1 (April 1, 2020): 194-207, accessed December 15, 2020, https://doi.org/10.1093/llc/fqy085. 

5 Benjamin Charles Germain Lee et al., “The Newspaper Navigator Dataset: Extracting Head- 
lines and Visual Content from 16 Million Historic Newspaper Pages in Chronicling America," in 
Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 
CIKM 720 (New York: Association for Computing Machinery, 2020), 3055-62, accessed December 
15, 2020, https://doi.org/10.1145/3340531.3412767; Benjamin Charles Germain Lee and Daniel S. 
Weld, *Newspaper Navigator: Open Faceted Search for 1.5 Million Images," in Adjunct Publica- 
tion of the 33rd Annual ACM Symposium on User Interface Software and Technology, UIST ’20 
Adjunct (New York: Association for Computing Machinery, 2020), 120-122, accessed December 
15, 2020, https://doi.org/10.1145/3379350.3416143. 
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1 The Digital Humanities and Visual Analysis 
of Newspapers 


The visual culture preserved within historic newspapers has proven to be a 
fruitful and capacious source among scholars across diverse research areas. For 
example, scholars have studied the embedded editorial cartoons to understand 
the invocation of historical analogies;° comic strips to understand the evolution 
of humor about ethnicity; illustrations to study the portrayal of identity;? maps 
to assess cartographic practices as well as the spatial thinking abilities of read- 
ers;? and photographs to study the history of photojournalism.'? Within Jewish 
Studies, Sarah Stein's book Making Jews Modern: The Yiddish and Ladino Press 
in the Russian and Ottoman Empires makes a compelling case for the significance 
of visual culture within Ladino and Yiddish newspapers. Stein's detailed analy- 
sis of the advertisements within the Constantinople-based Ladino newspaper El 
Tiempo traces recurring motifs in order to argue how readers sought remedies to 
the anxieties of modernity under Ottoman rule: advertisements for medicines, 
insurance, clothes, travel, and lotteries all targeted readers concerned about sta- 
bility and class.“ This chapter builds on this already significant body of work in 
order to consider advertisements and other visual content in the Ladino press at 
the macroscopic scale. 

Scholarship making use of visual culture in historic newspapers has been 
redoubled by the growing interest in visual analysis within the digital humani- 
ties. Though research in the digital humanities has historically centered around 
text as the primary medium of interest, the field's *visual digital turn" over the 


6 Betty H. Winfield and Doyle Yoon, “Historical Images at a Glance: North Korea in American 
Editorial Cartoons," Newspaper Research Journal 23, no. 4 (September 1, 2002): 97-100, accessed 
June 10, 2021, https://doi.org/10.1177/073953290202300411. 

7 Cole, How the Other Half Laughs. 

8 Andrea N. Williams, “Cultivating Black Visuality: The Controversy over Cartoons in the Indi- 
anapolis Freeman," American Periodicals 25, no. 2 (2015): 124-38, accessed June 10, 2021, http:// 
www.jstor.org/stable/24589083. 

9 Pinar Sarın and Necla Ulugtekin, "Analyzing Newspaper Maps for Earthquake News through 
Cartographic Approach," ISPRS International Journal of Geo-Information 8, no. 5 (May 2019): 235, 
accessed June 10, 2021, https://doi.org/10.3390/ijgi8050235; André Reyes Novaes, Maps in News- 
papers: Approaches of Study and Practices in Portraying War since 19th Century (Leiden: Brill, 
2019), accessed June 10, 2021, https://brill.com/view/title/54806. 

10 Michael Griffin, *The Great War Photographs: Constructing Myths of History and Photojour- 
nalism," in Picturing the Past: Media, History and Photography, ed. Bonnie Brennan and Hanno 
Hardt (Urbana: University of Illinois Press, 1999), 122-57. 

11 Stein, Making Jews Modern, 153-201. 
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past decade has begun foregrounding the analysis of visual media, including 
images and video.” This visual digital turn has coincided with methodological 
advances in machine learning approaches to image analysis due to deep learn- 
ing. With the democratization of deep learning approaches to image recognition 
over the past few years via open source libraries and pre-trained models, digital 
humanities practitioners have begun utilizing these approaches for a wide range 
of research goals, from enriching the metadata of digitized collections to analyz- 
ing sitcoms.” As these approaches continue to improve, it is clear that machine 
learning will occupy an increasingly important role within the digital humani- 
ties and the humanities writ large, as well as within the cultural heritage sector, 
including libraries and archives. Within periodicals studies, researchers have 
begun utilizing machine learning approaches to study the visual components of 
newspaper pages, from analyzing visual layouts to classifying and searching the 
visual content embedded within the pages. Indeed, as a mode of humanistic 
inquiry, the application of machine learning to the visual analysis of newspapers 
has much to offer to both Jewish Studies and the digital humanities. 

In the case of the Ladino press, the utilization of machine learning method- 
ologies to study the visual content embedded within the newspaper pages is even 
more urgent due to the extant challenges surrounding the application of optical 
character recognition (OCR) algorithms to transcribing Ladino texts. Off-the-shelf 
OCR algorithms have yielded poor performance to date because these algorithms 
interpret Ladino texts printed in Rashi script as Hebrew; the poor OCR quality in 
turn restricts the ability of scholars to perform reliable keyword searches or apply 
digital humanities methodologies for textual analysis. Though new OCR engines 


12 Lev Manovich, *How to Compare One Million Images?," in Understanding Digital Humanities, 
ed. David M. Berry (London: Palgrave Macmillan, 2012), 249-78, accessed December 15, 2020, 
https://doi.org/10.1057/9780230371934 14; Lev Manovich, “Data Science and Digital Art History,” 
International Journal for Digital Art History, no. 1 (June 26, 2015), accessed December 15, 2020, 
https://doi.org/10.11588/dah.2015.1.21631; Wevers and Smits, “The Visual Digital Turn." 

13 Joshua Gomez et al., “Experimenting with a Machine Generated Annotations Pipeline," The 
Code4Lib Journal, no. 48 (May 11, 2020), accessed December 10, 2020, https://journal.code4lib. 
org/articles/15209; Matthew Lincoln et al., “CAMPI: Computer-Aided Metadata Generation for 
Photo Archives Initiative," October 8, 2020, accessed December 15, 2020, https://doi.org/10.1184/ 
R1/12791807v1; Elizabeth Lorang et al., “Digital Libraries, Intelligent Data Analytics, and Aug- 
mented Description: A Demonstration Project," 2020, accessed December 15, 2020, https://labs. 
loc.gov/static/labs/work/experiments/final-report-revised june-2020.pdf; Taylor Arnold, Lau- 
ren Tilton, and Annie Berke, “Visual Style in Two Network Era Sitcoms," Journal of Cultural Ana- 
lytics, July 19, 2019, 11045, accessed December 15, 2021, https://doi.org/10.22148/16.043. 

14 Piper, Wellmon, and Cheriet, *The Page Image"; Fyfe and Ge, "Image Analytics and the Nine- 
teenth-Century Illustrated Newspaper"; Wevers and Smits, “The Visual Digital Turn"; Lee et al., 
*The Newspaper Navigator Dataset"; Lee and Weld, "Newspaper Navigator." 
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are being developed specifically for Ladino texts, a fundamental challenge 
remains at this time: how do we study the Ladino press at a macroscopic scale 
beyond close reading?” Applying machine learning to study the visual content in 
these pages affords us a path forward. 

The next sections of this chapter concern the methodology employed to 
extract and analyze this visual content within 15,820 Ladino newspaper pages 
using machine learning. The chapter then turns to analyzing the extracted visual 
content. 


2 Constructing the Dataset of Excavated Visual 
Content 


This chapter explicitly builds on Newspaper Navigator, a project that I created in 
order to develop machine learning methodologies for extracting and analyzing 
visual content in historic newspapers. Just as the goal of OCR is to unlock the 
ability to search textual content, the goal of Newspaper Navigator is to unlock 
the ability to search visual content embedded within historic newspaper pages. 
The project utilizes a visual content recognition model to automate the identi- 
fication and classification of visual content within digitized historic newspaper 
pages according to seven different categories: photographs, illustrations, maps, 
comics, editorial cartoons, headlines, and advertisements. The visual content 
recognition model is an instantiation of a machine learning algorithm known as 
aneural network, which has achieved state-of-the-art performance across a range 
of machine learning tasks and domains, from images to text. 

The visual content recognition task considered in Newspaper Navigator is a 
form of object detection, a computer vision task formulated with the goal of iden- 
tifying and localizing objects in images. Accordingly, the visual content recogni- 
tion model began as an object detection model that had been trained by machine 
learning practitioners on the oft-used Common Objects in Context dataset to 
identify the locations of objects such as dogs and bicycles in images.! In order 
to succeed at this more granular task of identifying visual content on digitized 
newspaper pages, I subsequently trained the model on crowdsourced bounding 
box annotations of visual content in World War I-era newspaper pages as part of 


15 DiJeSt Team, “Our Text Recognition Ground Truth and Model - DiJeSt," accessed March 31, 
2021, https://dijest.net/gtmodel/. 

16 Tsung-Yi Lin et al., “Microsoft COCO: Common Objects in Context." ArXiv:1405.0312 [Cs], Feb- 
ruary 20, 2015, accessed December 10, 2020, http://arxiv.org/abs/1405.0312. 
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the Beyond Words initiative launched by the Library of Congress." This frame- 
work of utilizing a model that had already been trained on one task and training 
it on a more specialized, domain-specific dataset is known within machine learn- 
ing as pre-training and finetuning. An example of predictions made by the fully 
trained model on a sample newspaper page from Chronicling America is shown 
in Figure 1. 

In the construction of the Newspaper Navigator dataset, this trained visual 
content recognition model was utilized to extract visual content from 16.3 million 
newspaper pages in the Chronicling America database. The visual content recog- 
nition model, as well as the code and training dataset utilized to train it, is avail- 
able at the Library of Congress's GitHub repository for the project.? The model is 
also pip-installable via the LayoutParser library.” For more information on the 
visual content recognition model, I refer the reader to the technical paper on the 
Newspaper Navigator dataset, as well as the corresponding data archaeology.”° 

In this section, I describe the process of utilizing this visual content recogni- 
tion to extract visual content from Ladino newspapers in more detail. The corpus 
of Ladino newspapers consists of 15,820 Ladino newspaper pages from eight titles 
published between 1890 and 1948, amounting to 63.3 gigabytes (GB) of image 
data.” Table 1 presents a breakdown of the Ladino corpus included in this anal- 
ysis according to title and date of publication. Manually assembling a dataset of 
extracted visual content across this full corpus would require hundreds of human 
annotation hours. However, the automated Newspaper Navigator visual content 
recognition model can process multiple pages per second on a single graphics 
processing unit (GPU), making it possible to process the full Ladino corpus in just 
a few hours. 

To begin the processing of these pages, I first moved the high-resolution 
images of digitized Ladino newspaper pages to a private Amazon AWS $3 bucket, 
a form of cloud storage that facilitates fast computing against the corpus. I then 
wrote code to process these pages using the existing Newspaper Navigator visual 


17 LC Labs, “Beyond Words," accessed March 31, 2021, http://beyondwords.labs.loc.gov/#/. 

18 Benjamin Charles Germain Lee, LibraryOfCongress/Newspaper-Navigator, GitHub Reposito- 
ry, Library of Congress, 2020, accessed December 10, 2020, https://github.com/LibraryOfCon- 
gress/newspaper-navigator. 

19 Zejiang Shen et al., “LayoutParser: A Unified Toolkit for Deep Learning Based Document 
Image Analysis," ArXiv:2103.15348 [Cs], March 29, 2021, accessed June 10, 2021, http://arxiv.org/ 
abs/2103.15348. 

20 Benjamin Charles Germain Lee, “Compounded Mediation: A Data Archaeology of the News- 
paper Navigator Dataset," Digital Humanities Quarterly 15, no. 4, 2021, accessed January 10, 2022, 
http://digitalhumanities.org/dhq/vol/15/4/000578/000578.html. 

21 Lee et al., "The Newspaper Navigator Dataset." 
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Figure 1: An example of the visual content recognition model's predictions on a page from 
Chronicling America. Chronicling America, The Ogden Standard, April 16, 1918, accessed 
December 10, 2020, https://chroniclingamerica.loc.gov/lccn/sn85058396/1918-04-16/ed-1/ 
seq-6/. For each bounding box, the predicted category is shown in the top left, along with the 
machine learning model's confidence score. 
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content recognition model that had been trained on the Beyond Words annota- 
tions. Because machine learning models can be stored as single files known as 
*weights files," which can be loaded onto a computer and utilized for processing 
data with just a few lines of code, the majority of this code was devoted to han- 
dling the downloading of images from the cloud and processing the images in 
parallel. To deploy this code, I ran the processing pipeline on a rented Amazon 
AWS g4dn.12xlarge EC2 instance consisting of 48 CPUs and four NVIDIA T4 
GPUs.” In total, the pipeline extracted six classes of visual content across the 
corpus of Ladino newspapers: photographs, illustrations, maps, comics, edito- 
rial cartoons, and advertisements.? I then saved the resulting extracted images, 
as well as metadata from the machine learning model, to the AWS S3 bucket, 
making it straightforward for us to download the full dataset and relevant 
subsets as necessary. I am currently in the process of investigating options for 
making this dataset of extracted visual content available to researchers and the 
public alike. 

A breakdown of identified visual content is presented in Table 2. Because the 
machine learning model returns a confidence score with each predicted bound- 
ing box, and because one's choice of threshold cut on confidence score affects 
one's tradeoff between false positives and false negatives and thus changes the 
size of the resulting dataset, I include three cuts on confidence score in the table: 
90%, 70%, and 50%. 

Notably, this visual content recognition model was trained on annotated 
World War I-era newspaper pages in Chronicling America, rather than annotated 
Ladino pages. Consequently, the resulting dataset contains a nontrivial number 
of false positives and false negatives, as evidenced by the map class, which largely 
consists of false positives. It should be noted that the performance of the visual 
content recognition model is dependent on a confluence of factors, ranging from 
page layout to time period, typeface, language, and even subtleties of the digiti- 
zation pipeline, such as the scanner used to image the pages. For an analysis of 
the effects of time period on the generalization performance of the visual content 
recognition model, I refer the reader to the Newspaper Navigator dataset paper.” 
It is undoubtedly the case that the utilization of a model trained on annotations 
for Chronicling America pages, rather than Ladino pages, impacts the resulting 


22 The total computing costs for testing and deploying the pipeline amounted to less than 
50 USD. 

23 Because the pages did not have corresponding OCR (for reasons described earlier in this chap- 
ter), this pipeline did not attempt to extract any textual captions within the predicted bounding 
boxes, and predicted headlines were also omitted from the resulting dataset. 

24 Lee et al., "The Newspaper Navigator Dataset." 
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dataset. This effect can be quantified by evaluating the performance of the visual 
content recognition model on a hand-labeled test sample of Ladino pages (an 
evaluation left for future work). However, as evidenced by the analysis of the 
dataset presented in the next section, it is clear that the resulting dataset is of 
more than sufficient quality for supporting downstream exploration and research 


pertaining to questions of humanistic inquiry. 


Table 1: Ladino newspaper titles with corresponding number of images and digitized 
pages processed using Newspaper Navigator. In the case of La Vara, each image contains 
two newspaper pages. In bold are the statistics for all digitized pages for a given title. 


Newspaper Title # of Digitized Images # of Pages 
El Instruktor revista siyentifika i literaria 331 331 
El jugeton, Jurnal umoristiko 4 4 
El Kirbatch Americano (1915-1917) 206 206 
El Luzero Sefaradi (October 1926) 28 28 
El Luzero Sefaradi (May 1927) 28 28 
El Luzero Sefaradi (Total) 56 56 
El Progresso / Yosef Daat 332 332 
El Tiempo (1890-1891) 926 926 
El Tiempo (1896-1897) 1,138 1,138 
El Tiempo (1900) 584 584 
El Tiempo (1900-1901) 981 981 
El Tiempo (Total) 3,629 3,629 
La Boz de Oriente (April 1931-April 1932) 860 860 
La Vara (January 9, 1922- June 4, 1923) 141 282 
La Vara (May 1, 1927-December 17, 1929) 675 1,350 
La Vara (January 3, 1930-December 20, 1932) 701 1,402 
La Vara (December 7, 1932-April 25, 1941) 667 1,334 
La Vara (January 6, 1933-December 27, 1935) 636 1,272 
La Vara (January 3, 1936-August 26, 1938) 704 1,408 
La Vara (September 2, 1938-April 25, 1941) 681 1,362 
La Vara (May 2, 1941- December 29, 1944) 648 1,296 
La Vara (January 5, 1945-February 13, 1948) 348 696 
La Vara (Total) 5,201 10,402 
Total 10,619 15,820 
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Table 2: A breakdown of extracted visual content in the Ladino newspaper titles 
processed. Three different cuts on the visual content recognition model's confidence 
score (9096, 7096, and 5096) are presented to show the effect of the cut choice when 
favoring false positives or false negatives. 


Visual Content Type #>90% #> 70% # > 50% 
Photographs 348 770 1,060 
Illustrations 52 378 960 
Maps 27 182 300 
Comics 10 39 184 
Editorial Cartoons 8 31 111 
Advertisements 18,381 31,523 42,505 
Total 18,826 32,923 45,120 


3 Analyzing the Excavated Visual Content 


To begin the macroscopic analysis of the excavated visual content, I created clus- 
ter-based visualizations of advertisements and photographs, grouped according 
to their semantic content. In this step, I generated image embeddings for all of 
the extracted visual content; to accomplish this, I modified the Newspaper Nav- 
igator pipeline code, available in the Library of Congress GitHub repository for 
Newspaper Navigator, and utilized img2vec, a library for the streamlined gener- 
ation of image embeddings from image files.” The image embeddings utilized 
in this analysis are lower-dimensional representations of the images extracted 
from the hidden layers of ResNet-18 and ResNet-50, two neural image classifica- 
tion models.” Originally trained on ImageNet, these models can classify images 
according to their content (e.g., “dog” or “cat”).”’ Because these models capture the 
semantics of images, image embeddings generated by feeding images into these 
models capture semantic similarity: if the distance between two image embed- 
dings (vectors that are each hundreds or thousands of dimensions in length) is 


25 Lee, GitHub Repository; Safka, Christian, Christiansafka/Img2vec, Python, 2021, accessed De- 
cember 10, 2020, https://github.com/christiansafka/img2vec. 

26 K. Heet al., "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference on Com- 
puter Vision and Pattern Recognition (CVPR), 2016, 770-78, accessed December 10, 2020, https://doi. 
org/10.1109/CVPR.2016.90. 

27 J. Dengetal., “ImageNet: A Large-Scale Hierarchical Image Database," in 2009 IEEE Conference 
on Computer Vision and Pattern Recognition, 2009, 248-55, accessed December 10, 2020, https://doi. 
org/10.1109/CVPR.2009.5206848. 
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small, the corresponding images likely have similar semantics. Thus, generating 
visualizations of photos and advertisements clustered based on the image embed- 
dings can provide an informative summary of the landscape of visual content. To 
make the high-dimensional clustering visible in two-dimensional visualizations, 
I have utilized T-SNE, a dimensionality reduction algorithm that preserves close 
clusters of points, meaning that clustered points in the visualization are also clus- 
tered in the high-dimensional embedding space. It should be noted that long dis- 
tances are not preserved in T-SNE, so the relative positions of clusters should not 
be taken into consideration.”® 

To start, I generated visualizations of the 348 photographs identified by the 
Newspaper Navigator visual content recognition model with confidence scores 
greater than 90%. Figure 2a presents the cluster visualization of these 348 pho- 
tographs. In this visualization, I present a summary view that would have ordi- 
narily required manually inspecting and analyzing 15,820 pages. Examining the 
visualization, it is immediately apparent that many photographs depict people. 
Portrait shots, such as the ones clustered together and depicted in Figure 2b, are 
one of the most common types of image. Other notable clusters include wartime 
photographs (shown in Figure 2c) and crowds and groups of people (shown in 
Figure 2d). 

As shown in Table 2, the critical mass of identified visual content consists of 
advertisements. The advertisements embedded within the Ladino press attest to 
daily life within Sephardic culture from Constantinople to New York. By studying 
these extracted advertisements at a macroscopic scale, it is possible to augment 
the extant historiography, including Sarah Stein's detailed analysis of advertise- 
ments in the Ladino and Yiddish Press.?? 

Given the number of advertisements identified, I chose to generate cluster 
visualizations of advertisements for La Vara for different temporal regions. 
Figure 3a shows a cluster visualization of 2,812 advertisements extracted from 
La Vara issues published between January 3, 1936, and August 26, 1938. The vis- 
ualization reveals numerous distinct clusters of the same advertisements repro- 
duced multiple times throughout multiple issues of La Vara within the given tem- 
poral range. Many of these clusters contain dozens of the same advertisement, 
reflecting businesses that chose to advertise consistently within the pages of 
La Vara. Figure 3b shows six of these clusters, along with magnified versions of 


28 Laurens van der Maaten and Geoffrey Hinton, “Visualizing Data Using T-SNE,” Journal of Ma- 
chine Learning Research 9, no. 86 (2008): 2579-605. For more information on T-SNE, see: Martin 
Wattenberg, Fernanda Viégas, and Ian Johnson, *How to Use T-SNE Effectively," Distill 1, no. 10 
(October 13, 2016): e2, accessed December 10, 2020, https://doi.org/10.23915/distill.00002. 

29 Stein, Making Jews Modern, 153-201. 
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Figure 2a: A cluster visualization of the 348 photographs identified by the visual content 
recognition model with confidence scores greater than 90%. | constructed this visualization 
using ResNet-50 embeddings and T-SNE for dimensionality reduction. 


the reproduced advertisements. Examples of advertisers with reproduced adver- 
tisements in the dataset include Brockman Monument Works, Meyer London's 
Matzos, Standard Truss Co., Golden Wine & Liquor Co., Paradise Interior Deco- 
rators, Aristocratic Imported Virgin Olive Oil, Harem Oriental Pastry, Constan- 
tinople Oriental Pastry Shop, Macedonia Importing Co., Royal Hall, Mid-Bronx 
Used Car Exchange, the Luxor Food Market, the Luxor Restaurant, the Sephardic 
Jewish Center, Inc., Joseph Levy (furniture, radios, and oilcloths), Louis J. Opal 
(counselor at law), Simon S. Nessim (counselor at law), Dr. J. Feitelson (dental 
surgeon), Irving Matalon, P. Vladeff, and Madame Gilda Malky. The overwhelm- 
ing majority of these advertisements are for local New York City businesses and 
also feature prominent English text, both of which reflect La Vara's role as an 
American Sephardic press within New York City. 
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Figure 2b: A magnified cluster within Figure 2a consisting of portrait shots of people. 


Analysis of the surfaced advertisements that have been reproduced many 
times over reveals similar advertising patterns to those uncovered within El 
Tiempo by Sarah Stein: a preponderance of advertisements of sartorial nature, as 
well as for doctors, dentists, medical treatments for ailments, and legal counsel.?? 
In the case of Constantinople-based El Tiempo, Stein argues that these advertise- 
ments speak to readers' anxieties under the precarity of Ottoman rule, including 
class and economic anxieties and aspirations. The apparent resonances between 
the advertisements in El Tiempo and those in the New York-based La Vara suggest 
an even broader pattern of Sephardic Jewish experiences in response to social 
and economic uncertainty and change during the late 19th and early 20th centu- 
ries, whether in the United States or the Ottoman Empire. 


30 Stein, Making Jews Modern, 185-87. 
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Figure 2c: A magnified cluster within Figure 2a consisting of wartime photographs. 


Of particular interest are the recovered advertisements for Meyer London's 
Matzos (entry b in Figure 3b), which appeared concurrently in American Yiddish 
newspapers. As described by Makena Mezistrano, “Matsa advertisements in the 
American Yiddish and Ladino presses offer a rare opportunity to place these two 
communities in dialogue with one another, instead of only positioning them as 
separate or in bitter conflict - two common assumptions about intra-Jewish rela- 
tionships in twentieth century New York.”** Thus, the advertisements uncovered 
through this cluster-based analysis speak to not only individual Sephardic com- 
munities but also relationships between and across communities, as embedded 
within cultural practices. 

However, not all clusters correspond to advertisements. In Figure 3c, magni- 
fied clusters of extracted photographs and full newspaper pages from La Vara are 
shown. These clusters are false positives, reflecting the imperfect performance 
of the visual content recognition model utilized for visual content extraction. 
These clusters are an important reminder that algorithmic approaches to extract- 
ing visual content are inevitably imperfect. However, using clustering and other 
machine learning techniques, it is possible to remove many of these false posi- 
tives quickly. 


31 Mezistrano, “Why Are These Passover Ads Different from All Other Ads?" 
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Figure 2d: A magnified cluster within Figure 2a consisting of photographs of crowds and 
groups of people. 


4 Future Work 


Ongoing work consists of continuing to explore the extracted visual content 
via both macroscopic analysis and close analysis within the page-level context. 
In terms of macroscopic analysis, I plan to expand the study of the reprinting 
patterns of advertisements in order to examine the network of advertisers that 
funded the Ladino press. By building on the provocations offered in this chapter, 
one can ask questions such as: did advertisers purchase advertising space in 
different titles? And what does this tell us about the interconnectedness of the 
Ladino press? Moreover, I plan to expand this analysis to different temporal slices 
of La Vara along with quantitative assessments of different photograph types in 
order to understand the evolution of the visual content. I will also expand this 
analysis to include a greater exploration of the other Ladino titles present in the 
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Figure 3a: A cluster visualization of 2,812 advertisements identified by the visual content 
recognition model with confidence scores greater than 90% within issues of La Vara published 
between January 3, 1936 and August 26, 1938. | constructed this visualization using ResNet-50 
embeddings and T-SNE for dimensionality reduction. 


dataset (as enumerated in Table 1). With this future analysis, I can begin to ask 
questions surrounding the intended audiences of the advertisements and how 
they changed over time as well as by title, building on Sarah Stein's analysis of 
the advertisements in El Tiempo.” 

Because the analysis of the visual content in this chapter has focused on the 
extracted dataset, future work also entails understanding the visual content by 
recontextualizing it at the page level within the broader mise en page. What types 
of articles accompany visually similar photographs? What advertisements appear 
next to one another? What do the captions reveal about the visual content? 


32 Stein, Making Jews Modern, 153—201. 
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Figure 3b: Six different magnified clusters within Figure 3a showing advertisements reprinted 
throughout different issues of La Vara (top), along with magnified versions of the reproduced 
advertisements (bottom). The advertisements are for Brockman Monument Works (a), Meyer 
London's Matzos (b), Standard Truss Co. (c), and Aristocratic Imported Virgin Olive Oil (d, e, f). 


From a computational perspective, future work with the dataset of extracted 
visual content entails evaluating the generalization of the visual content recog- 
nition model on the Ladino newspaper pages. This evaluation will require manu- 
ally annotating enough pages across different titles and temporal slices in order 
to derive reliable statistics. With this in-depth evaluation across many different 
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Figure 3c: Magnified clusters of photographs and full pages, showing false positives among 
the identified advertisements from La Vara. 


newspapers at varying time periods, it is possible to better understand the bias 
of the visual content recognition model, which will, in turn, inform the results of 
this macroscopic analysis even further. Lastly, given that so many of the adver- 
tisements had captions written in English, future work entails running English 
OCR engines on the advertisements utilizing the results for textual analysis of the 
captions. 

Other potential work includes cross-matching the visual content in the Ladino 
press with the extracted visual content from other newspaper corpora, such as 
the visual content from 16 million pages in Chronicling America contained within 
the Newspaper Navigator dataset. Identifying reproduction patterns among the 
visual content within these different presses could indeed speak to the prox- 
imity or marginal position of the Ladino press in relation to broader American 
newspaper syndicates. Moreover, Makena Mezistrano's discovery of Meyer Lon- 
don's Matzos advertisements in both the American Sephardic press and Ameri- 
can Ashkenazic press suggests that this future direction of cross-matching visual 
content across different presses has the capacity to enrich our understanding of 
how cultural practices change across different communities. 

Lastly, in regard to the dataset of extracted visual content from Ladino titles, 
I have two primary goals for future work. First, as articulated earlier in this 
chapter, I plan to make this dataset of extracted visual content publicly available 
to encourage re-use among scholars and the public. Second, I hope to expand 
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the Ladino titles that have been processed in order to further excavate the visual 
content across the Ladino press. 


5 Ethical Considerations 


Given the profound implications of machine learning perpetuating marginali- 
zation and erasure through algorithmic bias and other mechanisms, any appli- 
cation of machine learning to cultural heritage collections would be remiss 
without a discussion of the ethical considerations surrounding doing so. Within 
the library, archive, and museum (“LAM”) community, there has been a growing 
effort to consider a critical, sociotechnical lens surrounding machine learning, 
data science, and cultural heritage. This effort has culminated in the develop- 
ment of responsible operations and best practices, as well as surveys of projects 
in this liminal space.” In the case of difficult and understudied histories, extra 
precaution must be taken, and a rich discourse in the scholarly community has 
explored the ethics of datafication and the application of machine learning meth- 
odologies within this context.?^In this section, I draw from these emerging bodies 
of work and explicitly build on the Newspaper Navigator data archaeology, which 
I wrote in order to detail the implications of machine learning for search and dis- 
covery from a sociotechnical perspective.” 

Though often overlooked, the marginal position of Sephardic Studies within 
Jewish Studies has been amplified by machine learning, having altered the dis- 
coverability of the Sephardic historical record via digitization. As detailed earlier 
in this chapter, off-the-shelf OCR algorithms consistently fail to transcribe Ladino 


33 Ryan Cordell, “Machine Learning + Libraries: A Report on the State ofthe Field,” 2020, accessed 
July 22, 2020, https://labs.loc.gov/static/labs/work/reports/Cordell-LOC-ML-report.pdf; Thomas 
Padilla, “Responsible Operations: Data Science, Machine Learning, and Al in Libraries,” OCLC, 
August 26, 2020, accessed December 15, 2020, https://www.oclc.org/research/publications/2019/ 
oclcresearch-responsible-operations-data-science-machine-learning-ai.html; Eileen Jakeway et al., 
“Machine Learning + Libraries Summit Event Summary," 2020, accessed February 12, 2020, https:// 
labs.loc.gov/static/labs/meta/ML-Event-Summary-Final-2020-02-13.pdf. 

34 Presner Todd. “The Ethics of the Algorithm: Close and Distant Listening to the Shoah Foun- 
dation Visual History Archive,” in Probing the Ethics of Holocaust Culture, ed. Fogu Claudio, Wolf 
Kansteiner, and Todd Presner (Cambridge, MA: Harvard University Press, 2020), 175-202; Benja- 
min Charles Germain Lee, “Machine Learning, Template Matching, and the International Tracing 
Service Digital Archive: Automating the Retrieval of Death Certificate Reference Cards from 40 
Million Document Scans," Digital Scholarship in the Humanities 34, no. 3 (September 1, 2019): 
513-35, accessed December 15, 2020, https://doi.org/10.1093/llc/fqy063. 

35 Lee, "Compounded Mediation." 
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texts with a high enough degree of fidelity to facilitate reliable keyword search 
and textual analysis. This effective erasure of Ladino texts from search and dis- 
covery platforms is the result of a confluence of factors, from the availability of 
training data to the monetary value of preferentially selecting widely studied 
languages for inclusion in proprietary OCR engines. Significantly, this linguis- 
tic erasure is not limited to Ladino: a similar systemic problem has been docu- 
mented for Yiddish and indigenous languages, which speaks to a specific form of 
algorithmic bias, in which human decisions surrounding which languages should 
be prioritized when training OCR algorithms have a profound impact on resulting 
scholarship.?® 

In this chapter, I seek not only to foreground the algorithmic marginalization 
of Sephardic history but also to offer an alternative approach to recover the voices 
that have been lost through digitization. Certainly, the utilization of machine 
learning to excavate visual content in Ladino newspapers is not without its own 
challenges. The Newspaper Navigator visual content recognition model performs 
better on pages that more closely resemble the training data, and the extracted 
dataset presented in this chapter suffers from a nontrivial number of false pos- 
itives and false negatives as a result.” These false positives and false negatives 
motivate methodological improvements, such as training a visual content rec- 
ognition model specifically for the Sephardic press in order to better capture the 
nuances of Sephardic visual culture. Moreover, image recognition algorithms 
used to evaluate image similarity have been shown to perpetuate their own forms 
of bias and marginalization.” Because these algorithms have been trained by 
machine learning practitioners with specific objectives and categories in mind, a 
fundamental question is raised as to whether the groupings identified by the algo- 
rithms capture the relationships most valuable to scholars. While these methods 
have the capacity to expose new groupings, they inevitably distort the viewer's 
perceptions of what constitutes similarity. I therefore offer these methodological 
approaches with such considerations in mind, a reminder of the importance of 
canonical historiographic approaches that can be used in concert with machine 
learning. 

And yet, this chapter has provided the first macroscopic view of the Ladino 
press via the excavated visual content and thus serves as a corrective to the algo- 


36 Hannah Alpert-Abrams, “Machine Reading the Primeros Libros," Digital Humanities Quar- 
terly 10, no. 4 (October 4, 2016), accessed December 15, 2021, http://www.digitalhumanities. 
org/dhq/vol/10/4/000268/000268.html; “Yiddish OCR Is Live! | Yiddish Book Center," accessed 
March 31, 2021, https://www.yiddishbookcenter.org/about/news/yiddish-ocr-live. 

37 Lee et al., "The Newspaper Navigator Dataset." 

38 Lee, “Compounded Mediation." 
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rithmic marginalization of Sephardic Studies. I therefore offer this work in pursuit 
of a digital humanities and a Jewish Studies that foreground Sephardic history 
and culture. 
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Abby Gondek 

Using Nodegoat to Track Gendered Political 
Networks: Henrietta Klotz's Influence on 
Henry Morgenthau Jr.'s Advocacy for Jewish 
Refugees and the State of Israel 


Abstract: In histories of how the U.S. government responded to the Holocaust, 
women’s roles are often invisible or under-explored. This is because political influ- 
ence and power are typically defined as being the domain of men, and because 
there were very few women in roles that were (or are) deemed as “powerful.” 
Women were more likely to be in secretarial roles; secretaries are not usually per- 
ceived as being influential.’ However, secretaries could exert political influence.” 
This chapter presents a case study of the impact that Henry Morgenthau Jr.’s assis- 
tant for 37 years, Henrietta Stein Klotz, had on the Secretary of the Treasury’s posi- 
tions in response to the Holocaust and his post-war fundraising for the State of 
Israel. 

A web-based, data management, network analysis, and visualization environ- 
ment, nodegoat, can be used to track the intersection of what is typically gendered 
female - the micro, private, social, and cultural — with what is typically gendered 


1 Rachel Century, “Dictating the Holocaust: Female Administrators of the Third Reich” (Royal 
Holloway, University of London, 2012), 182, 185, 190, 259; Leisa D. Meyer, Creating GI Jane: Sexu- 
ality and Power in the Women’s Army Corps during World War II (New York: Columbia University 
Press, 1996), 71-72, 76, 80-82. Century contends that the female SS auxiliaries (“helferinnen”) 
were subordinate to their male bosses; they were “helpers, working for, not with the men" (259, 
emphasis in original). Though they may have been close or intimate (sexual) with the men who 
employed them, these secretaries did not have access to their bosses’ confidential papers, with 
rare exceptions (182, 185, 190). Meyer articulates how women’s World War II employment was 
temporary and they were unable to change their second-class status (71-72). Seventy percent of 
women in the Women’s Army Corp worked in fields traditionally defined as “women’s work,” 
such as communication or clerical employment (72, 76). Male officers expected women to serve 
them through this gendered work, as nannies, maids, waitresses, and cooks (80-82). 

2 Elisabeth Krimmer, German Women’s Life Writing and the Holocaust: Complicity and Gender 
in the Second World War (Cambridge: Cambridge University Press, 2018), 10, 34, 36, 38; Judith 
Tydor Baumel, “Women’s Agency and Survival Strategies during the Holocaust,” Women’s Stud- 
ies International Forum 22, no. 3 (1999): 334-35. Krimmer argues that female secretaries within 
the Nazi government were able to wield more power than the term "auxiliary" typically implies. 
Baumel maintains that female secretaries were able to influence the male leadership of the Jew- 
ish councils, the ghetto police, and underground networks. 


3 Open Access. © 2022 Abby Gondek, published by De Gruyter. JEA This work is licensed under the 
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
https://doi.org/10.1515/9783110744828-011 
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masculine - the macro, public, institutional, and structural. Nodegoat enables the 
creation of multi-layered networks in which various types of nodes (“objects”) and 
relationships can be displayed simultaneously, demonstrating how interpersonal 
networks (typically gendered female) are simultaneously structural and integrate 
broader social phenomena such as ethnicity, religion, and politics (typically gen- 
dered male). Thus, nodegoat can demonstrate how women in secretarial roles, like 
Henrietta Klotz, could exert political influence through their ethnic, religious, and 
interpersonal networks. 


Keywords: Henrietta Klotz, Henry Morgenthau Jr., historical social network 
analysis, Holocaust, Israel, Jewish women, nodegoat, politics, secretaries, U.S. 
government 


1 Introduction 


Henry Morgenthau Jr. (HMJ) was an assimilated German Jew whose father (Henry 
Morgenthau Sr.) preferred his son steer clear of *Jewish affairs."? Morgenthau Sr. 
(HM Sr.) was the U.S. Ambassador to the Ottoman Empire (Turkey) during the 
Armenian genocide and advocated for Armenians who were victims of “race 
extermination."^ HM Sr. directly linked the plight of Armenians to that of the 
Jews, writing to his wife Josephine in 1913 that, like Jews, Armenians demon- 
strated a “stubborn adherence to their religion and a very strong race pride.”” 
HMJ served as the Secretary of the Treasury in the Roosevelt and Truman admin- 
istrations from January 1, 1934 to July 22, 1945.5 In 1943, Morgenthau Jr. reluctantly 
became involved in the conflict between Treasury and State regarding what to 
do with Jewish refugees." According to John Pehle, who was the director of the 
War Refugee Board, Morgenthau Jr. “didn’t want to stand out as a Jew." Henry 


3 Michael Beschloss, The Conquerors: Roosevelt, Truman, and the Destruction of Hitler's Germa- 
ny, 1941-1945 (New York: Simon & Schuster, 2002), 45-46. 

4 Rouben Paul Adalian, *Morgenthau, Ambassador Henry, Sr.," Armenian National Institute, 
2019, accessed January 10, 2022, https://www.armenian-genocide.org/morgenthau.html. 

5 Henry Morgenthau III, Mostly Morgenthaus: A Family History (e-Book), 2nd ed. (Lexington, MA: 
Plunkett Lake Press, 2019), 109, accessed January 10, 2022, http://plunkettlakepress.com/mmo.html. 
6 “Henry Morgenthau Jr. Papers, 1866-1953, Finding Aid," Franklin D. Roosevelt Presidential 
Library & Museum, n.d., accessed January 10, 2022, http://www.fdrlibrary.marist.edu/archives/ 
collections/franklin/index.php?p=collections/findingaid&id=159&q=&rootcontentid=72431. 

7 Henry Morgenthau Jr. et al., “Jewish Evacuation Meeting Transcript, December 18, 1943,” Henry 
Morgenthau Jr. Diaries, FDR Presidential Library and Museum 688II (1943): 85-87; Henry Morgen- 
thau III, Mostly Morgenthaus: A Family History (New York: Ticknor & Fields, 1991), 32324. 
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Morgenthau III (HMJ's son) explained that the Jewish advisors closest to Roo- 
sevelt (including his father) *avoided or downplayed the significance of Jewish 
questions."? After his forced resignation in 1945,? Morgenthau Jr. began to partic- 
ipate more explicitly in Jewish causes; he was the chairman of the United Jewish 
Appeal” and financially advised the State of Israel. I use nodegoat to explore 
an understudied reason for why Henry Morgenthau Jr. took the stances he did 
regarding Jewish refugees at the end of World War II, and afterward. 

The War Refugee Board developed through HMJ's Treasury department, and 
he was one of the core members of the Board (Figure 1). Morgenthau's male staff 
members (especially Randolph Paul, John Pehle, and Josiah DuBois) are often 
identified as being highly influential on his positions regarding rescue and relief 
during the latter part of World War II.” Henrietta Klotz, his assistant, has been 
under-examined as a central influencer for the political decisions HMJ made in 


8 Morgenthau III, Mostly Morgenthaus, 322-23. 

9 According to Henry Morgenthau III, when Roosevelt died suddenly and Truman became Pres- 
ident, Truman did not approve of Morgenthau Jr.’s harsh stance toward post-war Germany. Tru- 
man felt HMJ tried to reach beyond his role as Secretary of the Treasury. The new president sided 
with his ally Henry Stimson (Secretary of War) against HMJ. HM III emphasized the antisemitism 
he detected in the behavior of these two men. Stimson referred to HMJ's “Jewish vengeance." 
HM III asserted that Truman feared that because of the order of succession, if he and Secretary 
of State Byrnes died, HMJ would become the new president. HM III underscored, “The possibil- 
ity of having a Jewish president seems to have been one of the concerns." Morgenthau III also 
cited growing anti-Soviet sentiment and the association of Jews with communism as a factor 
in Truman’s decision to replace HMJ with Fred Vinson as Secretary of the Treasury. Vinson and 
Byrnes, both southerners, *were often seen as a pair." See Morgenthau III, Mostly Morgenthaus 
(e-Book), 329-30. 

10 Morgenthau III, Mostly Morgenthaus (e-Book), 334. The United Jewish Appeal was a philan- 
thropic umbrella organization established in 1939, which eventually became what is now the 
Jewish Federation of North America. UJA was made up of three agencies: the Joint Distribution 
Committee (JDC), the United Palestine Appeal (UPA), later the United Israel Appeal, and the Na- 
tional Coordinating Committing for Aid to Refugees (NCCR), later the National Refugee Service. 
The UPA represented Zionist interests. 

11 NYT Archive, “H.S. Klotz, 87, Aide to Treasury Secretary,” New York Times Online, December 
21, 1988, accessed January 10, 2022, https://www.nytimes.com/1988/12/21/obituaries/h-s-klotz- 
87-aide-to-treasury-secretary.html; Rebecca Erbelding, “Morgenthau Family Papers, 1860-2015, 
2015.255.1 Finding Aid” (Washington DC, 2015), 2, accessed January 10, 2022, https://collections. 
ushmm.org/search/catalog/irn96059. 

12 Rebecca Erbelding, Rescue Board: The Untold Story of America’s Efforts to Save the Jews of 
Europe (New York: Doubleday, 2018), 53; Randolph Paul, “Report to the Secretary of the Acqui- 
escence of this Government in the Murder of the Jews, given to Henry Morgenthau Jr. by His Staff 
(Josiah DuBois, John Pehle, and Randolph Paul) on January 13, 1944,” Morgenthau Diaries 693 
(1944): 212-29; Morgenthau III, Mostly Morgenthaus, 323. 


218 —— Abby Gondek 


Figure 1: The third meeting of the War Refugee Board, March 21, 1944, in Cordell Hull's office. 
Pictured left to right are: Cordell Hull, Secretary of State, Henry Morgenthau Jr., Secretary 
ofthe Treasury, Henry Stimson, Secretary of War, and John Pehle, Director of the WRB. FDR 
Library General Photograph Collection, Folder: Conferences, Commissions and Committees, 
World War Il, WRB, photo id: NPx82-61. 


relation to the Holocaust, especially after the war, when he became increasingly 
involved in fundraising for the State of Israel. Henry Morgenthau III (HM III) artic- 
ulated Henrietta's importance in this way: “While the Treasury lawyers [Paul, 
Pehle, and DuBois] were the activists in shaping and moving my father's rescue 
campaign, Henrietta Klotz was the catalyst."? However, HM III’s differentiation 
between “activists” and “catalyst” is problematic because it implies that Henri- 
etta only initiated HMJ's shift, while his male advisors (Paul, Pehle, and DuBois) 
were the real change agents regarding ongoing rescue policy. In contrast to this 
distinction, I maintain that Henrietta (and other women in similar “auxiliary” 
positions) should be recognized as executing power in political decision making 
during the Holocaust and immediate post-war period. “Power” can be concep- 
tualized through Henrietta's words in her interview with HM III, *But you see, I 
brought things to his attention." 

This chapter therefore asks the question of how, considering HMJ's assimi- 
lated Jewish upbringing and his father's non-Zionist stance, an analysis of Hen- 


13 Morgenthau III, Mostly Morgenthaus, 323. 
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rietta's influence can enable a clearer explanation for HMJ's eventual stances 
regarding relief and rescue efforts and his advocacy for Israel? It argues that 
digital tools, in this case the network analysis and visualization environment, 
nodegoat, offer new methodological possibilities to explore the interpersonal but 
also ethnic, religious, and political connections between Henry Morgenthau Jr. 
and the people closest to him, including Henrietta Klotz. This case study thus 
offers a topical as well as methodological contribution to Jewish Studies and the 
digital humanities; it demonstrates the importance of analyzing women's influ- 
ence on U.S. governmental stances in response to the Holocaust, and it discusses 
how digital tools such as nodegoat can be used to achieve this, by visually track- 
ing gendered political networks of impact. 

This article begins with an overview of Henrietta Klotz's religious and polit- 
ical background (Orthodox Judaism, exposure to Zionism) and her impact on 
HMJ, especially in the post-war period. She furthered a cause she believed in by 
facilitating meetings that enabled him to become a leader in the United Jewish 
Appeal and the Israel Bonds program. Section 3 critiques the typical dichotomy 
between ego-centric (qualitative, individual) and whole network (quantitative, 
social structure) approaches to historical social network analysis. I utilize a case 
study of the networks between Mrs. Klotz and HMJ to articulate how ego-centric 
approaches (focusing on the lives of individuals) can be simultaneously struc- 
tural and institutional, demonstrating the intersection of micro and macro levels 
of influence. I also clarify the benefits of using nodegoat - to visualize multi- 
ple types of nodes or "objects" at once. In Section 4 (made up of three smaller 
sub-sections), I explore the deeper layers of influence on HMJ and Henrietta, 
expanding on the simple network visualization in the second section. I ask who 
and what influenced the people who shaped HMJ and Henrietta, demonstrating 
interpersonal but also ethnic, religious, and political networks of connection. I 
elaborate on the relationships between Henrietta, Henry Morgenthau Sr., and 
Elinor Fatman Morgenthau (HMJ’s first wife) in order to investigate how Henri- 
etta managed to become more influential than both of them. Both Morgenthau 
Sr. and Elinor emphasized assimilation rather than affiliation with the organized 
Jewish community. Though HM Sr.’s belief in collaboration between minority 
ethnic groups inspired HMJ's dedication to saving Jewish refugees during and 
after World War II, Henrietta was the figure who proved to be loyal, protective, 
and encouraging, qualities that HMJ longed for, according to his son Morgenthau 
III.“ Elinor's role was significant in the realm of democratic politics, but not in 


14 Morgenthau III, Mostly Morgenthaus (e-Book), 342. HM III recounted that after *the three peo- 
ple whose counsel and affection my father most relied on had left him" (FDR in 1945, HM Sr. in 
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Jewish affairs. Towards the end of her life, she became slightly more involved in 
Jewish refugee relief efforts, but sadly, she died in 1949 before the results of this 
involvement could be determined. 


2 Henrietta Klotz: "Letting Israel 
[and Henry Morgenthau Jr.] Grow” 


In contrast to HMJ's secular and elite home environment, Henrietta Stein Klotz 
(Figure 2) was raised in a poor Orthodox Jewish household. Morgenthau's son, 
Henry Morgenthau III, argued that Henrietta was "the key to getting him involved 
in Jewish things.” Henrietta became more instrumental than HMJ's wife, Elinor 
Fatman.'ó Mrs. Klotz was Morgenthau's secretary for 37 years," longer than his 
marriage to Elinor (33 years) or to his second wife, Marcelle (16 years). Though 
Henrietta did not articulate why she was so invested in convincing Morgenthau 
to do more to rescue Jewish refugees, her husband, Herman, explained to HM 
III that the reason was in their shared familial contexts: “It is obvious ... that 
between Henrietta's background and my own that our interests permeated Hen- 
rietta's discussions with your father [HMJ] when matters of Jewish interests were 
involved."!? 

According to Herman Klotz, Henrietta nudged and insisted that Morgenthau 
*get the President to take some action that would minimize the killing of the 
Jews." Herman recounted that every morning when HMJ saw Mrs. Klotz in the 
office, he would look at her timidly, her countenance insisting *when [will you 


1946, and Elinor in 1949), *the course of his life changed as he became the devoted servant of 
the Jewish community. Was it partly an act of self-definition, a rebellion against his upbringing, 
a craving for love, a respect that he couldn't find anywhere else...? Perhaps the answer is yes to 
each." 

15 Herman Klotz, *Herman Klotz to Henry Morgenthau III, October 16, 1985" (Washington DC: 
USHMM 2015.255.1 Morgenthau Family Papers, Box 32, File 11, 1985), 7; Beschloss, The Conquer- 
ors, 53. 

16 Beschloss, The Conquerors, 55. 

17 Archive, “H.S. Klotz, 87, Aide to Treasury Secretary," 19. 

18 Edna S. Friedberg, *Elinor Morgenthau (1891-1949)," in Jewish Women: A Comprehensive 
Historical Encyclopedia (Jewish Women's Archive, 1999), accessed January 10, 2022 https://jwa. 
org/encyclopedia/article/morgenthau-elinor; *Mrs. Henry Morgenthau, 73 Widow of Roosevelt's 
Aide," New York Times Archives, July 19, 1972, accessed January 10, 2022, https://www.nytimes. 
com/1972/07/19/ archives/mrs-henry-morgenthau-73-widow-of-rooseve-ts-aide.html. 

19 Klotz, “Herman Klotz to Henry Morgenthau III, October 16, 1985," 7-8. 
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Figure 2: Henrietta Klotz (uncredited) is pictured on the right of this photograph, from the 
Harris and Ewing collection at the Library of Congress. “Firm financeer [sic]. Sec. Henry 
Morgenthau, Jr., photographed at a press conference where he answered questions concerning 
the pending financing ... ." December 3, 1935, Library of Congress, https://www.loc.gov/ 
resource/hec.39676/. 


do something]?" If Morgenthau assumed he had done “as much as he could and 
absolved himself from further responsibility," Henrietta was unrelenting in “her 
efforts without giving up an inch." Herman attributed the creation of the War 
Refugee Board to his wife's persistence.?? 

In an August 1945 handwritten letter to Henrietta, HMJ proclaimed that she 
was the *watchdog of the Secretary of the Treasury" and asserted: *whatever 
credit I deserve" for rescuing Jewish refugees, "I want to share it equally with 
you." He underscored her persuasiveness in “Jewish affairs" where she was “par- 
ticularly understanding and helpful"; in fact, he believed she *made a real con- 
tribution towards winning the war."?! 


20 Klotz, “Herman Klotz to Henry Morgenthau III, October 16, 1985," 5-6. Herman does not spec- 
ify the time period, but it can be assumed that he was referring to the Holocaust period because 
he refers to “the killing of the Jews." 

21 Henry Morgenthau Jr., “Handwritten Letter from Henry Morgenthau Jr. to Mrs. Henrietta 
Klotz, August 5, 1945” (Washington DC: USHMM 2015.255.1 Morgenthau Family Papers, Box 32, 
File 11, 1945), 5, 8-10. The quotes in this paragraph all originate in this letter. 
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Soon after FDR's sudden death in April 1945 (in the same month that his wife, 
Elinor Morgenthau, had suffered a heart attack), Morgenthau was forced to leave 
the Treasury department.? HMJ expressed to Henrietta, that if he had perceived 
that his time at Treasury was up: 


we both of us might have had time to plan our future together. It would have made me very 
happy if we could have continued working together. But to my everlasting sorrow it did not 
work out that way. But I will continue to hope and plan that in the not too distant future we 
will be working side by side once again.” 


HMJ ensured that this possibility became a reality. He returned to New York, 
while Henrietta remained at Treasury in Washington, DC. HMJ called her daily 
and *begged her" to move back to New York to work with him. Henrietta agreed 
to move to New York but only if Morgenthau would work in “public service." She 
set up meetings with Henry Montor, Meyer Weisgal, William Rosenwald, Edward 
Warburg, and Rose Halpern. Through these meetings, Morgenthau became the 
chair of United Jewish Appeal (UJA) in 1946.74 Montor was in charge of public 
relations for the United Palestine Appeal and then became the executive vice 
president at UJA and HMJ's “mentor in Jewish affairs." They worked at UJA until 
1948 and then began the Israel Bond program in 1949; Morgenthau Jr. resigned in 
1954, after becoming increasingly less devoted to the program after his marriage 
to Marcelle Puthon Hirsch in 1951 (a Catholic, French divorced woman, who was 
uninterested in Israel or Jewish affairs).?® 

Henrietta's obituary made it appear that Morgenthau earned his position at 
UJA on his own and subsequently asked Henrietta to be his assistant." However, 
other sources place Henrietta as the architect behind this opportunity, through 


22 Morgenthau III, Mostly Morgenthaus (e-Book), 327-30. For a detailed explanation of the rea- 
sons for Morgenthau Jr.’s forced resignation see footnote 9. 

23 Morgenthau Jr., “Handwritten Letter from Henry Morgenthau Jr. to Mrs. Henrietta Klotz, Au- 
gust 5, 1945," 1-2. 

24 Unknown, “Undated Memo to Explain Henry Morgenthau Jr.'s August 5, 1945 Letter to Hen- 
rietta Klotz" (Washington DC: USHMM Accession 2015.255.1 Morgenthau Family Papers, Box 32, 
File 11, *Henrietta Klotz," 184-85 in file, n.d.). 

25 Morgenthau III, Mostly Morgenthaus (e-Book), 334. 

26 Morgenthau III, Mostly Morgenthaus (e-Book), 342, 344; Henrietta Klotz and Henry Morgen- 
thau III, *Henry Morgenthau III, Interview with Henrietta Klotz, September 19, 1978" (Washing- 
ton DC: USHMM 2015.255.1 Morgenthau Family Papers, Box 32, File 11, *Henrietta Klotz," 1978), 
58, 76. The Israel Bonds program was a fundraising tool initiated in 1951 to leverage the resources 
of Diasporic Jewish communities, especially in the U.S. and Canada, to assist the Israeli economy 
following the 1948 war and the influx of Holocaust survivors. 

27 Archive, *H.S. Klotz, 87, Aide to Treasury Secretary." 
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her “links to the organized Jewish community."?? Klotz “cultivated her boss's 
[HMJ] budding association with [Chaim] Weizmann and [Meyer] Weisgal during 
the war years.”?? The connection with Weizmann is especially notable consid- 
ering the antipathy between Morgenthau Sr. and Weizmann (see the section on 
Morgenthau Sr. below). HM III underscores that his father's aims “were primar- 
ily humanitarian" since in the beginning of HMJ's association with Montor and 
Weisgal he “was not yet a committed Zionist."?? 

Henrietta explained that after HMJ was forced to leave the Treasury depart- 
ment, *Nobody wanted him ... They wouldn't accept him ... he was a lost sheep ... 
it broke my heart."?' HM III depicted the situation this way: “After the humiliating 
denouement of his career in government ... he needed the respect and love that 
the Jewish community was ready to offer him.”?? When Meyer Weisgal, who was 
at the time the fundraiser for the Weizmann Institute, made an appointment to 
meet with HMJ, Morgenthau Jr. told Henrietta "that vulgarian is ... arranging a 
meeting of the Jews and they insist that I come to this meeting."? Weisgal was 
working on relief efforts for Jewish displaced persons, a cause that Morgenthau 
felt deeply invested in.” Mrs. Klotz noted that HMJ thought Weisgal was “very, 
very vulgar." So, she set up a preliminary meeting with Weisgal to prepare him for 
the conference with her boss, so that HMJ would be favorably disposed to what 
Weisgal proposed. Mrs. Klotz instructed Weisgal to “speak quietly and be very 
gentle and use the correct word and not the vulgar word."?^ Then, after HMJ met 
with Weisgal, Morgenthau Jr. exclaimed to Henrietta that Weisgal was "erudite" 
and "gentile." Mrs. Klotz believed that the United Jewish Appeal and Israel Bonds 


28 Morgenthau III, Mostly Morgenthaus (e-Book), 331. 

29 Monty Noam Penkower, “The Earl Harrison Report: Its Genesis and Its Significance," The 
American Jewish Archives Journal 68, no. 1 (2016): 11-12. Chaim Weizmann wrote to Henrietta in 
January 1944 to pass along a letter he had sent to Sam Rosenman that he hoped Henrietta would 
show to HMJ. The letter begins *My dear Henrietta" and ends "yours affectionately." The letter 
was printed on Jewish Agency for Palestine letterhead. Chaim Weizmann, "Letter from Chaim 
Weizmann to Henrietta Klotz, January 3, 1944," Morgenthau Diaries 689 (1944): 201. 

30 Morgenthau III, Mostly Morgenthaus (e-Book), 334. 

31 Klotz and Morgenthau III, “Henry Morgenthau III, Interview with Henrietta Klotz, September 
19, 1978," 53. 

32 Morgenthau III, Mostly Morgenthaus (e-Book), 335. 

33 Klotz and Morgenthau III, “Henry Morgenthau III, Interview with Henrietta Klotz, September 
19, 1978," 54-55. 

34 Morgenthau III, Mostly Morgenthaus (e-Book), 332. 

35 Klotz and Morgenthau III, “Henry Morgenthau III, Interview with Henrietta Klotz, September 
19, 1978," 54-55. 
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needed someone like HMJ “to awaken the Jews to help Israel.'?$ She created the 
optimal conditions so that HMJ could improve his social and professional network 
but also so that he would further a cause that she believed in. 

Henrietta and Morgenthau traveled together nationally and internationally 
(especially to Israel) on behalf of United Jewish Appeal and eventually launched 
the Israel Bonds program (1949-1954). In Henrietta's words, Morgenthau Jr. 
“became kind of a little Jewish after a while.”? She described how HMJ changed; 
at first he was not comfortable with speech-making or with shaking so many 
hands, but once he started seeing how much money he could raise by physically 
engaging with audience members, “he just curdled. He changed tremendously.”*? 
In reference to the urgency of HMJ's fundraising for the State of Israel, Henrietta 
explained: “They had to build up Israel. It was a baby. They had to let it grow."^? 
Henrietta's descriptions could be applied to her impact on HMJ; she was a central 
figure in the growth of his Jewish identity. She built up his consciousness of 
the need to help the State of Israel. This is particularly striking considering his 
father's positions regarding the idea of an independent Jewish nation. 

Rabbi Herbert Friedman, a chaplain and Haganah agent in Europe, who knew 
HMJ from their collaboration on the 1947 UJA fundraising campaign, told HM III 
that Morgenthau Jr. was “proud to be doing in his generation what his father 
hand done in his [Morgenthau Sr.’s] generation.” Friedman believed that HMJ 
understood (from living through the Holocaust period) that *most of the world 
didn't give a damn” about Jews. Morgenthau Jr. “became more and more deeply 
involved ... he became more convinced that Palestine was the only solution."^ 
Before HMJ joined the UJA, the organization raised $35 million, but because of 
him, the organization raised $102 million in 1946, $124 million in 1947, and $148 
million in 1948.*? 


36 Klotz and Morgenthau III, “Henry Morgenthau III, Interview with Henrietta Klotz, September 
19, 1978," 55—56. 

37 Archive, “H.S. Klotz, 87, Aide to Treasury Secretary”; Unknown, “Undated Memo to Explain 
Henry Morgenthau Jr.'s August 5, 1945 Letter to Henrietta Klotz," 1. 

38 Klotz and Morgenthau III, “Henry Morgenthau III, Interview with Henrietta Klotz, September 
19, 1978," 56. 

39 Klotz and Morgenthau III, “Henry Morgenthau III, Interview with Henrietta Klotz, September 
19, 1978," 57-58. 

40 Klotz and Morgenthau III, “Henry Morgenthau III, Interview with Henrietta Klotz, September 
19, 1978," 60. 

41 Morgenthau III, Mostly Morgenthaus (e-Book), 337-38. Haganah was the underground mili- 
tary organization in Palestine from 1920 to 1948. 

42 Morgenthau III, Mostly Morgenthaus (e-Book), 336, 338. 
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During HMJ’s first visit to Israel in October 1948, he cried while carrying 
Torah scrolls and dancing in the street with David Ben Gurion (Prime Minister of 
Israel) to celebrate Simchat Torah. The government named a settlement after him 
(Tal Shachar); at the dedication ceremony (presided over by Chaim Weizmann), 
HMJ called the event “one of the greatest moments of my life" and told the sol- 
diers gathered, *You are showing the world that the Jew is a fighting man ... you 
have raised the standard of the Jew in the eyes of the Christian world."^? 


Figure 3: Henry Morgenthau Jr. and his secretary Mrs. Klotz meet with Israeli officials Golda 
Meir (Myerson) and Eliezer Kaplan. “Lake Tiberias — An important business conference with 
Finance Minister Kaplan, Labor Minister Myerson, and Mrs. Klotz." Photo credit: United States 
Holocaust Memorial Museum, courtesy of Henry Morgenthau Ill, October 5, 1950, photo id: 
35016. 


According to Henry Morgenthau III, this meeting (Figure 3) occurred during Mor- 
genthau Jr.'s second visit to Israel in 1950. Elinor Morgenthau had died of a stroke 
in September 1949, only a few months before this trip. This was the meeting where 
the plans for the Israel Bonds drive were elaborated. HM III emphasized that Hen- 
rietta “was constantly at [my father's] side."^^ 


43 Morgenthau III, Mostly Morgenthaus (e-Book), 339-40. Simchat Torah is a Jewish holiday 
marking both the end and beginning of the annual cycle of public Torah readings. Torah scrolls 
are taken out of the ark and celebrants dance and sing. Tal Shachar means “Valley of the Dew" 
and Morgenthau means “morning dew." 

44 Morgenthau III, Mostly Morgenthaus (e-Book), 342. 
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3 Visualizing *Ego-Centric" Historical Social 
Networks in Nodegoat 


My inquiry into Henrietta Klotz's impact on Henry Morgenthau Jr. is an example 
of an “ego-centric” approach to social network analysis. According to Franzosi 
and Mohr, historical social network analysis (HSNA) is necessary for historians 
to move beyond traditional, linear, biographical narratives which predominate 
when using archival materials.“ Wetherell argues that historians have avoided 
social network analysis (SNA) because they are unfamiliar with social science 
quantitative methodologies." Franzosi and Mohr contend that HSNA produces 
a complex non-linear representation of *broader social institutions" rather than 
the mindset of individuals.^ These conceptions of HSNA maintain a dichotomy 
and hierarchy between the study of the mindset of individuals using traditional 
qualitative historical methodologies and the purportedly superior quantitative 
tools of SNA that provide access to societal structures. The "fact" of the quan- 
titative nature of social network analysis is taken for granted. HSNA has tended 
towards “whole network" approaches that are more comprehensive and struc- 
tural (read: quantitative) than “ego-centric network" approaches.“ Ego-centric 
network analysis purportedly requires diaries and personal correspondence and 
has not yet been used in a systematic way to track "affection and social sup- 
port."^? “Structural” or “whole network" approaches are perceived to be quanti- 
tative (and are gendered male), while the study of individuals is perceived to be 
qualitative and “ego-centric” (and tends to be gendered female). 

Feminist and postcolonial sociologists, anthropologists, and geographers have 
offered critiques of the ways that the “social” is often split from the "structural"; 
gender, race, and sexuality are often viewed as part of the “social,” so the “struc- 
ture” remains unimpacted by feminist, queer, and anti-racist critiques.” Gender 
is perceived as a variable (or an effect) rather than a complex analytical category; 


45 Roberto Franzosi and John W. Mohr, *New Directions in Formalization and Historical Analy- 
sis," Theory and Society 26, no. 2/3 (1997): 143, 145. 

46 Charles Wetherell, *Historical Social Network Analysis," International Review of Social History 
43, supplement (1998): 125, accessed January 10, 2022, https://doi.org/10.1017/S0020859000115123. 
47 Franzosi and Mohr, *New Directions in Formalization and Historical Analysis," 143-48. 

48 Bonnie H. Erickson, “Social Networks and History: A Review Essay," Historical Methods: A 
Journal of Quantitative and Interdisciplinary History 30, no. 3 (1997): 150—51, accessed January 10, 
2022, https://doi.org/10.1080/01615449709601182. 

49 Wetherell, *Historical Social Network Analysis," 130. 

50 Gurminder K. Bhambra, “Sociology and Postcolonialism: Another ‘Missing’ Revolution," So- 
ciology 41, no. 5, Special Issue on Sociology and Its Public Face(s) (2007): 876-77. 
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women are depicted as being affected by the social structure without altering the 
definition of this structure.” Feminist sociologists have challenged presumed 
dichotomies and hierarchies between public and private, macro and micro, insti- 
tutional and cultural/emotional/embodied, material and discursive, and “the 
West and the rest."?? Women are associated with the second term in each of these 
pairings. Institutions and “theory” (what is said to constitute valid *knowledge") 
are controlled by men and are perceived to be more professional and intellectual, 
while women's work is associated with “applied” or “popular” knowledge in the 
realm of “social problems."? As feminist geographers have noted, women are often 
left out of theorizing about the public realm of the workplace (coded as male); their 
role is perceived to be solely in the private realm where they provide emotional 
and sexual support to their male partners.” Quantitative analysis can devalue 
women's life histories (associated with the personal and relationships), relegating 
them to the margins of discussions of social structures and institutions. However, 
feminist sociologists have argued that life history should be defined as “doing 
social theory" because of the way that life history links what is typically defined 
as private (gender, sex, emotions) with what is often defined as public (politics, 
economics, education).” 

I use this theorizing by feminist social scientists to argue that interpersonal 
or individual networks are simultaneously structural and integrate broader social 
institutions, such as national, political, and ethnic or racial affiliations. I simul- 
taneously engage qualitative methodologies and network visualizations without 
depending upon the statistical analyses of SNA. To visualize the relationship 
between Morgenthau Jr. and Klotz (and compare it to the relationships between 
HMJ and his father, Morgenthau Sr., and his wife, Elinor Fatman Morgenthau), I 
utilize a historical social network analysis tool called nodegoat.5 


51 Barbara Laslett and Barrie Thorne, “Life Histories of a Movement: An Introduction," in Fem- 
inist Sociology: Life Histories of a Movement, ed. Barbara Laslett and Barrie Thorne (New Brun- 
swick, NJ: Rutgers University Press, 1997), 7, 15. 

52 Barrie Thorne, *How Can Feminist Sociology Sustain Its Critical Edge?," Social Problems 53, 
no. 4 (2006): 476. 

53 AishaKhan, "Introduction," in Women Anthropologists: Selected Biographies, ed. Ute Gacs et al. 
(Urbana: University of Illinois Press, 1989), xvii-xviii. 

54 Linda McDowell, “Space, Place and Gender Relations: Part I. Feminist Empiricism and the Ge- 
ography of Social Relations," Progress in Human Geography 17, no. 2 (1993): 166, 170, 173, accessed 
January 10, 2022, https://doi.org/10.1177/030913259301700202. 

55 Laslett and Thorne, “Life Histories of a Movement," 2-4, 6-8, 10. 

56 Pim van Bree and Geert Kessels, *Nodegoat: A Web-Based Data Management, Network Anal- 
ysis and Visualization Environment," LAB1100, accessed January 10, 2022, http://1ab1100.com, 
2013, http://nodegoat.net. 
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Nodegoat is a project of Lab1100 founded by Pim van Bree and Geert Kessels 
(https://lab1100.com/). It is an open access web-based data management, 
network analysis and visualization environment. I selected nodegoat because of 
its versatility; it enables users to design their own data models and visualize his- 
torical social networks chronologically and geographically. Another key reason 
is that it supports the crafting of multi-layered networks in which various types 
of nodes (“objects”) and relationships can be displayed simultaneously or inde- 
pendently, promoting a “continuous process of interaction with data” leading to 
deeper “levels of interpretation."^" Examples of multiple “nodes” or “objects” that 
I created and visualized at one time in nodegoat are: people, and organizations, 
or institutions. I also created “categories” that can be visualized simultaneously 
with these “objects”: racial and religious identities (see Figure 4). However, since 
one’s data model in nodegoat is designed by the researcher, any kind of “object” 
can be visualized simultaneously including (but not limited to): correspondence, 
events, archival collections, and publications. These are other kinds of objects 
that I created in my nodegoat database (but these are not visualized in the images 
included here). Because the nodegoat data model is designed by each researcher, 
the options are limitless; however, the more complicated the data model, the 
more complicated it becomes to decide what one wants to visualize and how to 
visualize it. The visualization of various types of nodes/relationships within one 
space is impossible within quantitative SNA, which is “ill-equipped to deal with 
multimodal networks” [networks with more than one kind of “node” or “object”] 
since when there are “three or more varieties of nodes, most algorithms used in 
network analysis simply do not work."5* 

Most importantly for my purposes, nodegoat supports the theoretical inter- 
section of what is sometimes still gendered “female” - the micro, private, social, 
and cultural - with what is still gendered as supposedly “ masculine” - the macro, 
public, institutional, and structural.” Thus, one network visualization can reveal 


57 Ingeborg van Vugt, “Using Multi-Layered Networks to Disclose Books in the Republic of Let- 
ters," Journal of Historical Network Research 1 (2017): 35, accessed January 10, 2022, https://doi. 
org/10.5072/jhnr.v1i1.7. 

58 Scott B. Weingart, “Demystifying Networks, Parts I & II,” Journal of Digital Humanities 1, no. 1 
(2011): 3, accessed January 10, 2022, http://journalofdigitalhumanities.org/11/demystifying-net- 
works-by-scott-weingart/. 

59 Bhambra, “Sociology and Postcolonialism," 876-77; Ute Gacs et al., Women Anthropologists: 
Selected Biographies (Urbana: University of Illinois Press, 1989), xiii; Laslett and Thorne, *Life 
Histories of a Movement," 8, 15, 21; Linda McDowell, *Space, Place and Gender Relations: Part 
II. Identity, Difference, Feminist Geometries and Geographies," Progress in Human Geography 17, 
no. 3 (1993): 173, accessed January 10, 2022, https://doi.org/10.1177/030913259301700301; Thorne, 
*How Can Feminist Sociology Sustain Its Critical Edge?," 476. 
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both macro- and micro-level connections and visually demonstrate what are typi- 
cally perceived to be two different spheres (the supposed "political" sphere of men 
and the presumed "private" sphere of women). 

The ability to visualize *broader social institutions" and the mindsets of indi- 
viduals simultaneously was imperative for my analysis of how women (who were 
often in roles with less power) managed to influence men in leadership positions 
within the U.S. government to bring relief and rescue to Jewish refugees during 
World War II. 

I found that nodegoat visualizations are clearer when only a small number 
of object types and categories are displayed at any one time (“scope” can be used 
for this purpose). Figure 4 incorporates only two object types (people and insti- 
tution) and two categories (racial/ethnic and religious identities). My data model 
contains other categories, but they are not included here for simplicity. Because 
I focus on ego-centric networks, it is important to be able to see each of the 
*nodes" in the visualization and track who/what each is connected to, therefore 
displaying fewer types of "objects" at once is preferable to perform this micro- 
level analysis. 

These types of social network visualizations serve multiple purposes; they 
display the connections between data that has been entered into the self-designed 
database, but they also allow for additional exploration of the data. By clicking 
on any of the “nodes” (circles) in this visualization, one can learn more about 
that specific person, organization, or identity (based on what one has entered 
into one's own database) and discover the people, organizations, and identities 
that are linked with the *nodes" one is interested in. For example, by clicking on 
“Zionism,” or “Orthodox Jewish" I can learn more about the connection between 
Henrietta Klotz and these political and religious affiliations, but I can also dis- 
cover which additional people and institutions relate to these identities in my 
database. 


4 Analyzing Political Influence Networks 


In this section, I demonstrate how the nodegoat network visualization above can 
be expanded to further explore the networks of influence on Henry Morgenthau 
Jr., including interpersonal but also ethnic, religious, and political connections. 
Especially considering HMJ’s assimilated Jewish upbringing and his father's 
non-Zionist positions, how can an analysis of Henrietta's influence enable new 
explanations for HMJ's eventual stances regarding relief and rescue efforts and 
his advocacy for Israel? 
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Using a "filter" in nodegoat, a researcher can select specific people (in this 
case HMJ and Henrietta Klotz) and then using "scope" within the visualization 
settings, it is possible to visualize the people, institutions, racial, ethnic, and 
religious identities that influenced the key people who impacted HMJ and Mrs. 
Klotz. This enables a multi-layered visualization depicting micro and macro levels 
simultaneously. I measure “influence” qualitatively, by analyzing archival sources 
that describe these relationships (interviews and correspondence) or depict the 
relationships as they occurred (meeting transcripts in the Henry Morgenthau Jr. 
Diaries). 

There is no way within nodegoat's social visualizations to visually compare 
the degree of influence or strength of a relationship between two people, except 
to evaluate in how many institutions they share membership or with how many 
people they share relationships. For example, in Figure 5, we can detect that Hen- 
rietta and HMJ were both connected to Elinor Morgenthau, Henry Morgenthau Sr., 
Stephen Wise, Dean Acheson (Acting Secretary of State), and Harry Dexter White 
(Assistant Secretary of the Treasury). We can also see that Henrietta shared her 
Zionist affiliation with Stephen Wise and Gerhart Riegner (World Jewish Congress 
representative in Geneva, Switzerland), and worked on the War Refugee Board 
with HMJ, John Pehle, Randolph Paul, and Josiah DuBois. However, the visualiza- 
tion alone does not provide more information about the content or level of influ- 
ence of any of these relationships. Within each person and institution "object," 
it is possible to create text fields to store this qualitative data that can be tagged 
based on the objects and categories in one's data model. The tags within text 
boxes can then be visualized (if desired). 

The connecting lines between “objects” (they are called “edges” within social 
network analysis) do not become thicker and the colors of the lines cannot be 
changed depending on the strength of a relationship. The *nodes" or "objects" 
(the circles) grow larger when they have more connected "nodes," but this does 
not illustrate the strength ofa relationship between any two people in the network. 
In Figure 5, the size of the nodes reflects how many other objects and categories 
are connected to them. HMJ is the largest circle because he is the center of this 
ego-centric network and I have entered the most information into the database 
about the people who influenced him, his institutional affiliations, his racial, and 
religious identities. Since my ego-centric networks are not comprehensive and do 
not represent the *whole network," centrality, network, and path measures are 
not relevant. 

This expanded and complex network visualization can be used to further 
explore the data. For example, how could Henrietta's connection with the institu- 
tional and religious identities “Zionism” and “Orthodox Jewish" have influenced 
HMJ’s political stances regarding Jewish refugees and advocacy for the State of 
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Figure 5: This is a more complex network visualization, which expands upon the one above. 

In this visual, the advisors/influencers (teal), institutions (yellow), racial/ethnic identities 
(turquoise), and religious identities (light blue) of the people who influenced HMJ and 
Henrietta Klotz are depicted. The people and institutions in this visualization are “objects,” 
while the racial and religious identities are *categories" in nodegoat. This enables analysis of 
an additional layer of influence. 
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Israel? How did Henrietta manage to become more influential than his father, 
Henry Morgenthau Sr., and his wife, Elinor Fatman Morgenthau? His father was 
against the creation of a Jewish nation-state and both HM Sr. and Elinor were 
connected to the religious identity, “Jewish, but not religious." In the next three 
sub-sections, I compare the Jewish identities of Henrietta, Morgenthau Sr., and 
Elinor and their contrasting relationships to Henry Morgenthau Jr. In the first 
sub-section, I examine Mrs. Klotz's Orthodox Jewish upbringing, connection to 
Zionism, and how she employed her Jewish identity to impact HMJ's decision 
making. In the second sub-section, I compare Morgenthau Sr.'s experiences of 
antisemitism, betrayal from his Jewish peers, and his increasingly non-Zionist 
positions with his son's experience of admiration from his fellow Jews through 
his fundraising for the State of Israel. I also argue that it was Henrietta's loyalty, 
protectiveness, and encouragement that endeared her to Morgenthau Jr., since 
he did not receive this support from his father. In the third and final sub-section, 
I assess Elinor's assimilationism, Christian friendship networks, involvement in 
democratic politics, and philanthropic causes. I emphasize HM III's evaluation 
that his father was closer to Henrietta than he was to Elinor. 


4.1 Henrietta used Silence as Political Strategy: “I Am Jewish 
and His Cause was Justified" 


Herman Klotz, Henrietta's husband, explained that Henrietta's father, Harry 
Stein, was a “pious, devout Jew" who was also poor.® Henrietta emphasized how 
Orthodox her parents were: “believe me, my parents were orthodox Jews, but very 
orthodox. I mean, really orthodox."*! Herman was also raised Orthodox, but more 
flexible, and Zionism was a frequent part of his family's conversations. He also 
underscored how likely it was that Henrietta's and his own background influenced 
HMJ regarding Jewish matters.‘ Henrietta informed HM III that even though she 
was raised in an Orthodox Jewish home, she was not raised with Zionism. She 
learned about it during the second four years of her tenure at Treasury with HMJ, 
through her attendance at parties hosted by an unnamed Jewish woman. 


60 Klotz, “Herman Klotz to Henry Morgenthau III, October 16, 1985," 7. 

61 Klotz and Morgenthau III, “Henry Morgenthau III, Interview with Henrietta Klotz, September 
19, 1978," 48. 

62 Klotz, “Herman Klotz to Henry Morgenthau III, October 16, 1985," 7-8. 

63 Klotz and Morgenthau III, “Henry Morgenthau III, Interview with Henrietta Klotz, September 
19, 1978," 48. 
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In one case, Henrietta hid the truth from Morgenthau Jr. so that he would 
unfreeze funds that were needed to help a group of rabbis escape from Poland. A 
rabbi (Kalmanowitz) asked for an appointment to see HMJ; he claimed he did not 
speak English, so he brought another rabbi to translate for him.$^ Neither of the 
rabbis knew that Henrietta could understand both German and Yiddish, since, 
as she explained to HM III, *I didn't look Jewish at that time. I was young and 
I was very, very blond."9 Kalmanowitz insisted (in Yiddish) that the translator 
push HMJ more since “he looks like the kind you can get away with an awful 
lot of things.” Then when HMJ explained that he would “take it under advise- 
ment," Kalmanowitz replied to his translator that once you hear that, "forget it. 
He's going to do nothing," and promptly pretended to faint. HMJ thought it was 
real and asked Henrietta to get this rabbi into HMJ's car and back to Kalmanow- 
itz's hotel.°° At that point, Kalmanowitz said in German to his translator, “did I 
cry well?” Henrietta explained to HM III: “of course I wouldn't tell your father 
[HMJ] because [if I did] he'd never let another rabbi in. And I am Jewish and his 
cause was justified.”° The Treasury ended up releasing funds to this group of 
rabbis and to thank HMJ, they threw him a “big dinner" and gave him “medals.”‘® 
Mrs. Klotz only told HMJ the truth many years later when they were working with 
Henry Montor on Israel Bonds.‘ This anecdote demonstrates how Henrietta's 
Jewish identity influenced her behavior with HMJ and in turn how these choices 
influenced the decisions he made in regard to rescue programs. In this situation, 
she chose to use silence or omission as a powerful tool of influence. 


64 Klotz and Morgenthau III, “Henry Morgenthau III, Interview with Henrietta Klotz, September 
19, 1978," 48. 
65 Klotz and Morgenthau III, “Henry Morgenthau III, Interview with Henrietta Klotz, September 
19, 1978," 49-50. Henrietta's mother had learned Yiddish upon arrival in the U.S. She taught her 
daughter both German and Yiddish. 
66 Klotz and Morgenthau III, “Henry Morgenthau III, Interview with Henrietta Klotz, September 
19, 1978," 49. 
67 Klotz and Morgenthau III, “Henry Morgenthau III, Interview with Henrietta Klotz, September 
19, 1978," 50. 
68 Klotz and Morgenthau III, “Henry Morgenthau III, Interview with Henrietta Klotz, September 
19, 1978," 50. 
69 Klotz and Morgenthau III, “Henry Morgenthau III, Interview with Henrietta Klotz, September 
19, 1978," 52. 
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4.2 Henry Morgenthau Sr.: *How Could a Father Say 
That about His Son?" 


Henry Morgenthau Sr. (also pictured in the nodegoat social network visualiza- 
tions, connected to both HMJ and Henrietta) was not a Zionist and had a falling 
out with Rabbi Stephen Wise over the issue of Zionism. HM Sr. believed that the 
U.S. was Zion for American Jews. This stemmed from Morgenthau Sr.'s desire to 
assimilate into the American upper class and wish for his son to be able to access 
professional and social networks which were beyond his own reach because of 
antisemitism."? His life was “centered around his business, his ambition and his 
son.” He feared “anything that would threaten his Americanism" and discour- 
aged his son from becoming involved in “Jewish things." 

HM Sr. had wanted to be Secretary of the Treasury or Commerce (or another 
cabinet position) when Woodrow Wilson was elected. Morgenthau Sr. had 
donated $20,000 to Wilson's campaign and was one of his original fundraisers. 
Despite the expectation among many in HM Sr.'s networks, he did not receive a 
cabinet post, probably because of antisemitism.” He was given the ambassador- 
ship in Turkey because, as Wilson explained, this was “the point at which the 
interest of American Jews in the welfare of Palestine is focused, and it is almost 
indispensable that I have a Jew at that post."? Rabbi Stephen Wise convinced 
HM Sr. to take the ambassadorship to further Jewish interests in Palestine.“ Mor- 
genthau Sr. felt that his ambassadorship to Turkey was a "Jewish slot"; it was the 
“only diplomatic post to which a Jew can aspire," he wrote in 1922.” 

HM Sr.’s non-Zionist stance originated in his on-the-ground experience 
in Palestine in 1914, when he witnessed firsthand the “instinctive hatred" that 
“Arabs” felt toward Morgenthau's group.’ HM Sr. felt that a Jewish state would 


70 Beschloss, The Conquerors, 44-46; Morgenthau III, Mostly Morgenthaus (e-Book), 86, 91-92, 
155-56, 165. 

71 Beschloss, The Conquerors, 44—46. 

72 Morgenthau III, Mostly Morgenthaus (e-Book), 92-93. In 1913, Stephen Wise wrote to HM 
Sr., "no Jew has been appointed to a single place of importance. It seems an almost deliberate 
slight." Charles Strauss wrote to Senator James A. O'Gorman, also in 1913, that the newspapers 
were criticizing President Wilson for his ingratitude to HM Sr. and were calling Wilson's neglect 
a form of “race prejudice." 

73 Morgenthau III, Mostly Morgenthaus (e-Book), 90—93. 

74 Morgenthau III, Mostly Morgenthaus (e-Book), 95-96. 

75 Klotz and Morgenthau III, *Henry Morgenthau III, Interview with Henrietta Klotz, Septem- 
ber 19, 1978," 6; Morgenthau III, Mostly Morgenthaus (e-Book), 91-93; Henry Morgenthau Sr. and 
French Strother, All in a Lifetime (Garden City, NY: Doubleday, Page & Company, 1922), 160. 

76 Morgenthau III, Mostly Morgenthaus (e-Book), 119. 
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only cause increased antisemitism." He feared that the Jewish campaign for a 
national homeland would trigger a genocide, like what was happening to the 
Armenians in the Ottoman Empire.” HM Sr. also became concerned when leaders 
of Aaron Aaronsohn's Jewish agricultural colony (Athlit, near Haifa) spoke about 
the need to *drive out the Arabs." Morgenthau Sr. described the profound feeling, 
as he prayed alongside Muslims, Christians, and Jews in the Caves of Machpelah 
in Hebron (near Jerusalem), that *we traced our religion back to the same source" 
and remarked that these ten minutes *were undoubtedly the most sacred that I 
have ever spent in my life." He was so impressed by the farm living he observed 
in Petah Tikvah (near Tel Aviv) and at Aaronsohn's colony in Athlit, that he 
encouraged his son, HM Jr. to take up farming.*? Morgenthau Sr. did not oppose 
the settlement of Eastern European Jews in Palestine, but only the creation of “a 
limited national state.”** 

Morgenthau Jr. echoed his father's beliefs in the importance of collaboration 
between minority ethnic or religious groups. In December 1943, his staff, includ- 
ing Henrietta, pushed him to directly approach FDR to enable the evacuation of 
Jews from France and Romania. HMJ was reluctant to approach the President 
as a "private citizen" and felt that he needed to locate his argument within his 
role as Secretary of the Treasury.?? From this standpoint, he could emphasize the 
“question of treating minority races." Then he reaffirmed a sentiment his father 
had expressed in 1914 and 1921, “Just because I am a Jew, why shouldn't I look 
after the Jews, or the Catholics, or the Armenians?" Mrs. Klotz reassured him that 
taking action would be connected to his role as Secretary of the Treasury: *You 
got into this thing on a Treasury basis - on a financial basis. It has led into this 
thing, you see.”*? 

In 1917, Zionists including Louis Brandeis, Felix Frankfurter, and Chaim Weiz- 
mann sabotaged Morgenthau Sr.'s mission to set up a separate peace with Turkey 
(initially endorsed by Woodrow Wilson but later this support was revoked). This 
was a consequence ofa temporary alliance between anti-Semites and Zionists who 
wanted to establish a British-controlled Jewish state in Palestine. They depicted 
HM Sr. as pro-German, pro-Turkish, anti-Zionist. Morgenthau Sr. felt the ultimate 
betrayal when his close friend Rabbi Stephen Wise headed a delegation of Jewish 


77 Morgenthau III, Mostly Morgenthaus (e-Book), 164. 

78 Morgenthau III, Mostly Morgenthaus (e-Book), 133-34. 

79 Morgenthau III, Mostly Morgenthaus (e-Book), 120, 123. 

80 Morgenthau III, Mostly Morgenthaus (e-Book), 114-15, 122-23. 

81 Morgenthau III, Mostly Morgenthaus (e-Book), 151, 155. 

82 Morgenthau Jr. et al., “Jewish Evacuation Meeting Transcript, December 18, 1943," 85-86. 
83 Morgenthau Jr. et al., “Jewish Evacuation Meeting Transcript, December 18, 1943,” 87. 
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leaders (which did not include HM Sr.) who convinced Wilson to approve of the 
Zionist plans.*^ HM III argued that these negative incidents pushed his grand- 
father to become increasingly oppositional to Zionism.” Importantly, Rabbi 
Wise had officiated Morgenthau Jr.'s marriage to Elinor Fatman in 1916,95 a fact 
which likely made Wise's advocacy with President Wilson even more hurtful to 
HM Sr. In his 1921 "Zionism a Surrender, not a Solution," HM Sr. opposed Chaim 
Weizmann’s Zionist agenda, calling it “the most stupendous fallacy in Jewish 
history,” an “‘eastern European proposal" that would “‘cost the Jews of America 
most [of what] they have gained.'"* 

Henry Morgenthau III told Henrietta that his grandfather, Morgenthau Sr. 
used to warn his son (HMJ) “the Jews will stab you in the back.” To this, Mrs. Klotz 
replied, “They didn’t stab him [HMJ] in the back ... they admired him tremendous- 
ly.”°® This exchange occurred in the context of a discussion about HMJ's public 
speaking campaigns to fundraise for Israel. Morgenthau Sr.'s sense of betrayal 
from fellow Jews likely began with the treatment he suffered from Rabbi Stephen 
Wise, Louis D. Brandeis, and Chaim Weizmann in 1917. But he underwent another 
betrayal from his former friend, Samuel Untermyer, who turned against HM Sr. 
when Morgenthau Sr. criticized Zionism in 1921. Untermyer described HM Sr.'s 
essay (“Zionism a Surrender") as “personal egotism." HM III explained that after 
this falling out, his grandfather “quite understandably withdrew permanently 
from all involvement with organized Jewry."9? 

Rather than perceiving Morgenthau Sr.’s increasing antipathy toward 
Zionism as solely political, structural, macro, and institutional, I argue that uti- 
lizing nodegoat encourages the conceptualization of the intersections of macro 
and micro levels of influence. HM Sr.'s stances regarding Zionism shifted because 
of his personal experiences on the ground in Palestine, but also because of inter- 
personal duplicity he suffered from people he considered friends and colleagues. 
His advice to his son to avoid involvement in Jewish causes resulted from this 
subjective experience of disloyalty but also from the sharp acrimony he felt when 
antisemitism led him to be passed over for prestigious government positions. 

Henrietta recounted various anecdotes illustrating how much she disliked 
Henry Morgenthau Sr. because of what she perceived to be his “cruel” or “crude” 
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behavior toward his son and toward her. She felt that Morgenthau Sr. *dominated" 
his son and *was very disappointed" in him. When FDR appointed HMJ as the 
Secretary of the Treasury, HM Sr. apparently stated, *he's [HMJ] not up for it" but 
that HM Sr. would be. Henrietta exclaimed to Henry Morgenthau III: *I disliked 
him from that time on. How could a father say that about his son?"?? Morgenthau 
Sr. likely said this because of jealousy; he had wanted that position for himself. 

In contrast, Henrietta described HMJ as “fair to everybody" and “honora- 
ble."?' Henrietta recounted another anecdote indicating HM Sr.’s snobbery, clas- 
sism, and elitism toward her specifically. This incident as well as Morgenthau 
Sr.’s deplorable treatment of his son (from her perspective) likely contributed to 
her assessment of the father. She was managing a property for the Morgenthau 
family and HM Sr. and his wife (Josephine) returned from Europe with various 
gifts. Josephine asked Henrietta which gift she would like but HM Sr. snapped, 
“oh she’ll [Henrietta] take anything she can get for nothing.” This offended Hen- 
rietta who “refused to accept anything.” When Josephine “went into this insti- 
tution” (unclear if it was a mental institution or nursing home) both HMJ and 
Henrietta visited her, while HM Sr. never did.” 

Mrs. Klotz’s attitude toward HM Sr. reveals her protectiveness, devotion, 
and loyalty to HMJ (as the term “watchdog of the Secretary of the Treasury” con- 
notes). Morgenthau Jr. emphasized these characteristics in his 1945 letter to her: 
“God help anyone who in your opinion was disloyal to me.”?? Henrietta told HM 
Ill, “if your father had asked me to jump off the roof, I think I would have done 
it.”°* Gabrielle Elliot Forbush (Elinor Morgenthau's friend from Vassar College), 
who had introduced Henrietta to Morgenthau initially, told HM III that Henrietta 
brought out the “‘suspicious vein’” in Morgenthau: “She wanted to protect him 
and she was fiercely loyal. But she viewed everyone and everything in a nar- 
rowly personal way.” Once Henrietta started to work at Morgenthau’s magazine, 
American Agriculturalist in 1922, “from then on they were together.'"?^ I maintain 
that HMJ trusted and relied upon Henrietta and followed her advice because she 
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was so devoted, loyal, and protective; this was the method through which she 
exercised her power. She encouraged and advocated for him in a way that his 
father did not. Henrietta gave him the confidence in himself that allowed him 
to set himself apart from his father and advocate for Jewish refugees and for the 
State of Israel. HM III explained that after the deaths of HMJ's father in 1946 and 
his wife, Elinor in 1949, the two people who had tried to "steer him away" from 
“Jewish affairs" were gone; it was at this exact time that Morgenthau Jr. “became 
the devoted servant of the Jewish community." Morgenthau III reasoned that this 
shift was a “rebellion” as well as a “craving for love, respect ...."?5 Through Hen- 
rietta's Jewish networks, she arranged for HMJ to be involved in organizations 
where he received that love and respect he craved. 

In the December 1943 Jewish evacuation meeting discussed above, Henrietta 
pushed Morgenthau Jr. into action through a combination of encouragement and 
praise. Morgenthau wondered if he should bring other Jews (Samuel Rosenman 
or Herbert Lehman) with him to meet with Cordell Hull (the Secretary of State) 
and the President about the proposal to evacuate Jews from France and Romania. 
Morgenthau preferred to go as the “Treasury” because he was concerned about 
taking a “Jewish delegation" and thus drawing too much attention to his Jewish- 
ness. Mrs. Klotz tried to embolden Morgenthau by exclaiming: *Mr. Morgenthau, 
nobody would do - none of these people you mentioned, when they are put on 
the spot, will do what you will do."?/ This interpersonal, social, or micro-level 
explanation is necessary to appreciate his political or macro-level positions. 


4.3 Elinor Fatman Morgenthau: Jewishness was “Ignored 
Completely” 


HM III described his parents’ “malaise” regarding their Jewish identity. Elinor 
and Morgenthau Jr. saw being Jewish as a “kind of birth defect that could not be 
eradicated, but with proper treatment, could be overcome, if not in this genera- 
tion, then probably in the next.”?® HM III called his mother more “firmly assim- 
ilationist” than HMJ and remembered how Elinor explained to her son that he 
should tell his friends he was “just American” when they asked what his reli- 
gion was.?? HMJ and Elinor did not attend a synagogue or German Jewish social 
clubs; “almost all” of their friends were elite German Jews, but “they never talked 
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about anything Jewish" and did not keep any Jewish religious objects at home.!?? 
HM III explained that his parents ate pork, celebrated both Christmas and Easter 
rather than Jewish holidays like Passover or Yom Kippur, did not go to Jewish 
doctors, dentists, or lawyers, and spent summers with Elinor's Protestant Vassar 
friends.!?! 

HM III explicitly referred to his mother's jealousy of Henrietta: “My mother 
had realized my father's attraction to his good-looking, intelligent, and ambitious 
young secretary." Henry Morgenthau III termed this the *Henry-Elinor-Henrietta 
triangle.”'” HM III told Henrietta during their interview that his father “was very 
close to you and depended [on you]." Henrietta affirmed that this was true. HM 
III then stated: “He was as close to you as he was to any human being.” HM 
III implied that Henrietta played a much more influential role for his father than 
his wife, Elinor, did. This was because as a traditional wife from an elite class, 
she could not be a career woman (whereas Henrietta worked from the age of 21). 
According to HM III, Elinor preferred that HMJ avoid “the plight of European 
Jewry," while Henrietta actively pushed HMJ to advocate for Jews. Also, Elinor 
became ill during World War II.!°* Edna Friedberg maintains that toward the end 
of the war, Elinor “supported her husband's increasing Jewish involvement ... 
including campaigning on behalf of the new State of Israel."!^5 Elinor visited the 
Emergency Refugee Shelter at Fort Ontario in Oswego, NY in 1944 with Eleanor 
Roosevelt and advocated for the continuing education of medical students who 
were refugees living at the camp.’”° 

Elinor Morgenthau studied theater at Vassar College (graduated in 1913), 
often acting as the male lead in the all-female productions, and spoke multiple 
languages including German, French, Spanish, and Russian. Her closest friends 
at Vassar were Protestant girls of *modest means"; as Gabrielle Forbush recalled, 
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there were very few Jewish girls at Vassar and Elinor's Jewishness was "ignored 
completely." Going to college was “very much the exception" among women 
of Elinor's generation.'” Maintaining feminine comportment was important to 
her.198 

After college, Elinor Fatman taught theater at the Neighborhood Playhouse, 
part of the Henry Street Settlement House on Manhattan's Lower East Side. Henry 
Street provided social services and encouraged Americanization for Eastern Euro- 
pean Jewish immigrants. The House was run by Lillian Wald, a nurse of German 
Jewish descent, and was a “kind of social service finishing school” for the chil- 
dren of elite families. Morgenthau Jr. also volunteered at Henry Street.!° HM Sr. 
had helped to create the Bronx House (an extension of the Henry Street House), 
and its music school was supported by Josephine (HMJ's mother) who was a tal- 
ented musician.!? Elinor's work at Henry Street was in line with the Morgenthau 
family's commitment to philanthropy, focusing on Jewish assimilation into white 
Christian cultural norms. 

Though Elinor could not continue her career as an actress because she had 
to focus on being a mother and “farmer’s wife," she used her theater skills when 
she became a speaker for the state Democratic Committee, Women’s Division.‘ 
Elinor trained in public speaking to promote Morgenthau Jr.'s political career 
and fill in for him when he could not attend speaking engagements. In 1941, she 
became Eleanor Roosevelt's assistant in the Office of Civilian Defense, but she 
soon resigned (because of administrative struggles and “Eleanor-haters”). After 
this, “her private overseeing” and “careful advice" to her husband “declined.” 
Henrietta began to have an increasing influence on HMJ as evidenced in his 
stance regarding the “plight of European Jewry."!? Elinor's influence on Mor- 
genthau Jr. was more in the realm of democratic politics than in Jewish affairs, 
though it seems that she became slightly more engaged in Jewish causes near 
the end of the war. Unfortunately, she had a heart attack in April 1945 and then a 
stroke in September 1949 that led to her death.'? 
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5 Conclusion 


A case study exploring Henrietta Stein Klotz's impact on Henry Morgenthau Jr., 
the Secretary of the Treasury from 1934 to 1945, highlights the power that women 
invoked in political decision making, in the position of secretary, a role not 
often correlated with the exercise of political authority. Specifically, exploring 
Mrs. Klotz's affiliations and actions can clarify why Morgenthau Jr. became more 
explicitly involved in Jewish causes, despite his father's and wife's avoidance of 
participation in “Jewish affairs." Henrietta used strategies like encouragement, 
loyalty, protectiveness, advocacy, and even sometimes silence or omission to 
push Henry Morgenthau Jr. to take political positions that were aligned with her 
own. Digital tools, like nodegoat (a database and network visualization platform), 
can make visible the people who are often either invisible or understudied in his- 
tories of the U.S. government's responses to the Holocaust: women, and more 
specifically, secretaries. Because nodegoat permits the visualization and explo- 
ration of multiple types of “objects” and "categories" simultaneously (whereas 
other network visualization software only reveals one type of object at a time), 
nodegoat supports the intersection of interpersonal and structural/institutional 
webs of influence and reveals how what is traditionally defined as feminine, 
micro, private, social, or cultural, is concurrently macro, public, and structural. 
This digital approach opens new ways to conceptualize women's and secretaries' 
influence on political decision making related to rescue and relief of refugees and 
fundraising after the war for the State of Israel. More broadly, historical social 
network analysis, as shown here, can help bring hitherto unheard voices and 
ignored agencies to the fore, offering an important contribution to Jewish Studies 
in the Digital Age. 
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Zef M. Segal 
Constructing the Modern Jewish “Present”: 
Time and Time Cycles in HaTzfira 


Abstract: The modern periodical is an important medium in the construction of 
time. Its appearance and cycles of production turn artificial time cycles into seem- 
ingly natural and accepted social rhythms. Most importantly, periodicals play an 
important role in the construction of the “present” as a time frame of occurrences 
that happen “now”. However, the reproduced “present” shouldn't be understood 
independently of the production cycle of the periodical. 

Accordingly, this study characterizes the differences resulting from the shift 
in time cycles of the nineteenth-century Hebrew periodical HaTzfira. This period- 
ical started in 1862 as a weekly and was transformed in 1886 into a daily. In order 
to explore the change, this chapter compares the discourse in the three years 
prior to the conversion of this weekly into daily (1883-1885) with the discourse in 
the three years following this conversion (1886-1888). 

Through the use of computational tools, and in particular topic modeling 
algorithms, which offer a general overview of large-scale textual corpora, this 
chapter compares discursive patterns before and after 1886. This comparison is 
based, on the one hand, on a nuanced qualitative analysis of the resultant topics, 
and on the other hand, on an original mathematical analysis of the resultant 
vector space. On a theoretical level, this comparison helps characterize the dif- 
ferences between the discursive rhythms of weeklies and dailies. It also contrib- 
utes to the introduction of computational tools into the study of Hebrew historical 
journalism. 


Keywords: temporality, topic modeling, Hebrew Journalism, dailies, Nineteenth 
century, print culture, digital humanities, history of time 


Yet if the present were always present, it would not pass into the past: it would not be time 
but eternity. If then, in order to be time at all, the present is so made that passes into the 
past, how can we say that this present also “is”? The cause of its being is that it will cease 
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to be. So indeed we cannot truly say that time exists except in the sense that it tends toward 
non-existence.' 


History is the study of time: locating and understanding events, people, andideas 
that existed in a certain time, along a certain timeline. But the concept of time 
itself is usually taken for granted. In his Confessions, St. Augustine says “What 
then is time? Provided that no one asks me, I know. If I want to explain itto an 
enquirer, I do not know.” Unfortunately, the work of historians is to do exactly 
that, ask questions: what is time? who's time? and most importantly, *how is 
it that everyone, in making a choice, constructs their own personal time while 
still remaining subject to the restraints of social and natural time?”? Norbert Elias 
begins to answer these questions by stating that “timing thus is based on peo- 
ple's capacity for connecting with each other two or more different sequences of 
continuous changes, one of which serves as a timing standard for the other (or 
others)."^ Taking Elias's definition as a guideline, this article explores the history 
of *timing standards," as they are expressed within a 19th-century Jewish journal. 

The modern periodical is an important medium in the construction of time. 
Its appearance and cycles of production turn artificial time cycles into seem- 
ingly natural and accepted social rhythms. Most importantly, periodicals play an 
important role in the construction of the *present" as a time frame of occurrences 
that happen “now.” As a result of the public simultaneity of newspaper produc- 
tion and newspaper consumption, periodicals create simultaneous social and 
national times *marked by temporal coincidence, and measured by clock and 
calendar."? However, the reproduced “present” shouldn't be understood inde- 
pendently of the production cycle of the periodical. Time is not *a smooth and 
standard sequence of ticks of a clock to be lived through, but rather a sequence 
of whole blocks of time which contain partially predictable and broadly recurring 
sets of meaningful events. It is around these blocks of time that we construct the 
cycles which organize social time into smaller and larger temporally embedded 
structures." Accordingly, this study intends to characterize the differences result- 
ing from the shift in time cycles of the 19th-century Hebrew periodical HaTzfira. 
This periodical started in 1862 as a weekly and was transformed in 1886 into a 
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4 Norbert Elias, Time: An Essay (Oxford: Blackwell, 1992), 72. 

5 Benedict Anderson. Imagined Communities: Reflections on the Origin and Spread of National- 
ism (London: Verso, 1991), 32-35. 

6 J. David Lewis and Andrew J. Weigert, *The Structures and Meanings of Social Time," Social Forces 
60, no. 2 (1981): 439. 
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daily. In order to explore the change, I compare the discourse in the three years 
prior to the conversion of this weekly into a daily (1883-1885) with the discourse 
in the three years following this conversion (1886-1888). 

Previous academic analysis of journalistic news production in general, and 
news temporality in particular has relied primarily on human-driven techniques 
like traditional content analysis.” However, as Ryan Cordell argues, the digitiza- 
tion of periodical archives has resulted in a shift in their study.? This shift is not 
just in the increasing accessibility of these journals; it is an actual methodolog- 
ical shift that allows researchers to deal with questions that were previously too 
difficult to ask. The growing academic field of periodical studies is a direct result 
of new innovative computational techniques and widespread digitization in the 
last two decades.? Instead of viewing journals as mere conveyors of discrete bits 
of information, computational tools transformed them to autonomous objects of 
analysis. They offer a new perspective on large-scale textual corpora which can 
reveal patterns that traditional modes of analysis typically cannot. 

In an attempt to map and characterize the “present” in HaTzfira's discourse, I 
apply algorithmic topic-modeling analysis, which enables identification of latent 
themes in the corpus. While applying computational tools to the study of his- 
torical journalism has become relatively known and accepted, it is still absent 
in the study of historical Hebrew journalism.'? Nevertheless, such application is 
possible because of recent breakthroughs in the processing of previously digi- 
tized historical Hebrew periodicals." This allows us to dramatically upgrade the 
Optical Character Recognition (OCR) identification rate of these periodicals and, 
consequently, use digital tools to analyze them. 


7 Nikki Usher, Making News at the New York Times (Ann Arbor: University of Michigan Press, 
2014); M. Neiger, and K. Tenenboim-Weinblatt, “Understanding Journalism through a Nuanced 
Deconstruction of Temporal Layers in News Narratives," Journal of Communication 66, no. 1 (2016): 
139—60; B. Zelizer, “Epilogue: Timing the Study of News Temporality," Journalism 19, no. 1 (2018): 
111-21. 

8 Ryan Cordell, “What Has the Digital Meant to American Periodicals Scholarship?," American 
Periodicals: A Journal of History and Criticism 26 (2016): 2-7. 

9 Dean Latham and Robert Scholes, *The Rise of Periodical Studies," PMLA 121 (2006): 517-31; 
Maria DiCenzo, “Remediating the Past: Doing "Periodical Studies’ in the Digital Era," ESC: English 
Studies in Canada 41 (2015): 19-39. 

10 Zef Segal and Oren Soffer, “One Journal, One Decade, 3,797592 Words: Computational Anal- 
ysis of HaTzfira's Discourse (1874-1883)," Journal of Jewish Studies 72, no. 2 (2021): 369—96; Zef 
Segal, “From One End of the Earth to the Other End of the Earth’: Changing Perceptions of the 
World in Late-Nineteenth-Century Hebrew Journalism," Jewish Studies Quarterly, forthcoming. 
11 Oren Soffer et al., “Computational Analysis of Historical Hebrew Newspapers: Proof of Con- 
cept," Zutot — Perspectives on Jewish Culture 17 (2020): 97-110. 
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Before exploring the changing concept of “present” in HaTzfira, I will provide 
historical and methodological background. I begin by discussing the circum- 
stances that brought about the specific shift of HaTzfira from a weekly publica- 
tion to a daily one. This is followed by a discussion on our chosen methodology 
of topic modeling. The second part of the article discusses the topic-modeling 
analysis of HaTzfira's discourse. 


1 The Acceleration of the Hebrew Press 


HaTzfira was founded and published in Warsaw in 1862 by Haim Zelig Slonimski, 
who also served as its editor. This newspaper and other Hebrew periodicals, such 
as HaMaggid and HaMelitz, were part of an extensive network of Jewish journals 
published not only in Hebrew but in Yiddish and other European languages." 
Publication of HaTzfira ceased after six months and was restarted 12 years later, 
first in Berlin and later in Warsaw. 

The original intent of HaTzfira's founder was to promote scientific and tech- 
nological knowledge among the observant Jews in Eastern Europe.? However, 
the style and subject matter of HaTzfira soon changed. In the renewed HaTzfira 
(from July 1874) an attempt was made to satisfy the readers' interest in Jewish 
polemics and world politics. From 1874, world news rather than scientific inno- 
vations filled the pages of the newspaper.!^ The second, and more significant, 
change occurred in 1881, following the eruption of pogroms in southwest Russia: 
world politics lost their dominance in favor of Jewish-related matters, especially 
those concerned with antisemitism.” 

Another important milestone in HaTzfira occurred in 1886, when it became a 
daily newspaper. The decision of the editors to accelerate the frequency of publi- 
cation was a response to similar transformations in other Hebrew periodicals. The 
first Hebrew daily was HaYom, first published in Saint Petersburg on January 31, 
1886, which was soon followed by HaTzfira (April 13, 1886) and HaMelitz (July 12, 


12 Israel Bartal, “Mevaser U-Modi'a Le-Ish Yehudi': Ha-Itonut Ha-Yehudit Be-Afik shel Hidush," 
Katedra 71 (1994): 154-64 [Hebrew]; Israel Bartal, “Mi-‘Kahal’ Le-Kehilat Kor'im," in Ein Le-Falpel! 
Iton Ha-Tzfira ve-Ha-Modernizatzia shel Ha-Si'ach He-Hevrati Ha-Politi, ed. Oren Soffer (Jerusalem: 
Mosad Bialik, 2007), ix-xiii [Hebrew]; Oren Soffer, “Paper Territory’: Early Hebrew Journalism 
and Its Political Roles," Journalism History 30 (2004): 31-39. 

13 Oren Soffer, Ein Le-Falpel! Iton Ha-Tzfira ve-Ha-Modernizatzia shel Ha-Si'ach He-Hevrati 
Ha-Politi (Jerusalem: Mosad Bialik, 2007) [Hebrew]. 

14 Segal and Soffer, “One Journal, One Decade." 

15 Soffer et al., “Computational Analysis." 
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1886). The old generation of editors, who established the Hebrew weeklies in the 
1850s and 1860s, warned that Hebrew journalism had not reached a stage that 
could support production of content suitable for a daily publication.! Moreover, 
they assumed that most general news was irrelevant for a Jewish audience and 
therefore did not see any merit in accelerating the frequency of news production. 
In contrast, a younger generation of journalists and editors, who began taking a 
leading role in the 1880s, saw the shift from weekly to daily as an inevitable step 
toward the modernization and popularization of the Hebrew press. They recog- 
nized the rapid growth in Hebrew readership of the 1880s and understood the 
need of readers for a daily newspaper. This clash between generations was also 
evident within HaTzfira’s editorial board. The 76-year-old establishing editor of 
the periodical, Slonimski, opposed the attempt to transform HaTzfira into a daily, 
while the 26-year-old Nahum Sokolov saw this transformation as unavoidable. 
This clash ended in Sokolov's victory. Starting from 1886, Sokolov became an 
equal partner to Slonimski as owner and editor of the newspaper. 

The use of computational tools, and more specifically topic-modeling anal- 
ysis, helps detect the changes that occurred in HaTzfira's discourse following its 
shift to daily publication, in terms of both the pattern of the topics discussed and 
their content. The following section briefly discusses the methodology of topic 
modeling. 


2 Topic Modeling 


The problem with periodical studies is that "their seriality, abundance, ephem- 
erality, diversity, heterogeneity — posed problems for those who wanted to access 
their contents.” As a result, distant reading approaches, such as topic modeling, 
that extrapolate backwards from a collection of documents to infer the discourses 
that could have generated them, have been used by many scholars over the past 
decade.” 

Developed in 2003, this generative statistical technique identifies groups of 
words that tend to occur together in a large collection of documents. It assigns 


16 Gideon Kouts, News and History (Jerusalem: The Zionist Library, 2013), 81 [Hebrew]. 

17 James Mussel, The Nineteenth-Century Press in the Digital Age (London: Palgrave Macmillan, 
2012), 2. 

18 C. Jacobi, W. Van Atteveldt, and K. Welbers, “Quantitative Analysis of Large Amounts of Jour- 
nalistic Texts Using Topic Modelling," Digital Journalism 4, no. 1 (2016): 89-106; Segal and Soffer, 
“One Journal, One Decade.” 
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each appearance of a word within a single document to one of a given number of 
topics. The number of topics is one of the parameters required for the execution 
of the algorithm; however, there are no generalized guidelines for the optimal 
number of topics. This choice is usually made following experimentation with 
various numbers of topics.” These are not necessarily “topics” in the sense of a 
theme, since the algorithm has no knowledge of the actual content or context of 
the words. A “topic” is merely a pattern of co-occurring words. However, these 
clusters of co-occurring words are rarely random, and they allow us to infer the 
latent structure behind a collection of documents. 

Once topics are generated and assigned to every appearance of each word, 
the corpus, its documents, and words used in the corpus become vectors that 
reflect the distribution of topics within each object. For example, if document A 
includes 100 words, of which 55 words are affiliated with “topic 1,” 45 words with 
“topic 2," and none with “topic 3," then I could define document A by its distribu- 
tion vector (55, 45, 0) and compare it with the vectors of other documents. Simi- 
larly, if the word *dog" appears 100 times throughout the whole corpus, and 50 of 
these appearances are affiliated with "topic 1," 30 appearances are affiliated with 
“topic 2," and 20 appearances are affiliated with “topic 3," then the word “dog” 
could be defined by its distribution vector (50, 30, 20). Thus, topic modeling can 
be used to detect changes within the corpus or similarities between documents or 
words. The technique's main advantage is its ability to switch between different 
levels of association within a corpus: words, documents, and topics. 

As stated previously, the generated topics have no pre-defined meaning. In 
this study, the meanings of the unsupervised generated topics were identified 
by examining the semantic relations between the most frequent terms in each 
topic, as well as reading the journal issues, which were statistically identified as 
the most reflective of the topic. This allowed a critical evaluation of the compu- 
tational output in order to make sure the results of the algorithm were not mean- 
ingless or arbitrary. In addition, the results were evaluated against existing qual- 
itative scholarship. 

The corpus consists of six years of HaTzfira's discourse (1883-1888), which 
includes three years prior to the change from weekly to daily and three years after 
that change. On the one hand, this corpus is broad enough to show long-term 
changes and pursue meaningful computational analysis; on the other hand, it is 
focused enough to avoid discursive influences of significant historical events, such 
as the 1881 pogroms or the 1897 First Zionist Congress. I initiated three different 


19 H.M. Wallach, D.M. Mimno, and A. McCallum, *Rethinking LDA: Why Priors Matter," Advanc- 
es in Neural Information Processing Systems 22 (2009): 1973-81. 
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topic models on the corpus, each designed to produce 20 topics and in which the 
units of analysis were individual issues of the journal. The first was conducted on 
the whole corpus (860 issues), thus enabling a broader overview and a detection 
of changes and patterns over the full period (see Section 3). The second and third 
models were conducted on sub-corpuses: 150 issues published 1883-1885, and 710 
issues published 1886-1888 (see Section 4). These two models allow us to char- 
acterize the discourse in each period and compare patterns and topics before and 
after the change from daily to weekly. 


3 The “Longue Durée": 1883-1888 


Six years cannot be considered what the French historian Fernard Braudel defines 
as a longue durée, but the concept suits the aim of the first topic model. Braudel 
describes a history that opposes daily events, that is not episodic in nature, one 
which “requires getting to know slower temporalities, almost immobile ones.”?° 
Accordingly, before identifying and exploring the characteristics of the weekly 
“present” and the daily “present,” I look at a larger picture, a corpus consisting of 
both periodicities at once. At first, I use vector analysis to compare topic distribution 
in each six-month period during the time span. The choice to group together issues 
published within six months ensures the readability of the data. Each six months 
is understood as a 20-dimensional vector, in which each coordinate denotes the 
portion of the relative topic within that period's discussions. Principal component 
analysis (PCA) reduces the 20-dimensional space into a two-dimensional visuali- 
zation, in which the choice of axes optimizes the variation between the topics.”* 
Figure 1 shows the differences between the periods in the relevant time span. 
Although a longue durée approach concerns itself with continuity and gradual 
evolution, the distribution along the x-axis in this figure clearly differentiates 
between two groups of points. The left group consists of all the periods after the 
transformation, when the journal was published as a daily, and the right consists 
of all the periods before the conversion, when the journal was published as a 
weekly. As well as being far apart from each other, the distribution along the y-axis 
of both groups reveals entirely different patterns. The weekly group of points is 


20 Fernand Braudel, *Histoire et sciences sociales: La longue durée," Annales 13, no. 4 (1958): 
725-53. 

21 Christof Schóch, “Principal Component Analysis for Literary Genre Stylistics,” The Dragon- 
fly's Gaze, last updated September 8, 2016, accessed January 23, 2020, http:// dragonfly.hypoth- 
eses.org/472. 
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Figure 1: Principal component analysis (PCA) of topic distribution, 1883-1888. Each axis 
reflects a single dimension of the vectors as identified by the algorithm. The percentage listed 
on each axis reflects the proportion of variance that the specific axis reveals. The distinctions 
between left and right, between up and down, as well as the values listed along each axis 
have no significance in themselves. They show the differences between vectors. Visualization 
is done with ClustVis. 


heavily clustered with seemingly no chronological rationale behind its distribu- 
tion. In contrast, the daily group is aligned chronologically from bottom to top with 
a relatively uniform distribution of points, with the exception of the second half of 
1887. The different characteristics of these two groups reflect a rupture, almost as 
if the analysis had shifted to a different newspaper. Considering the relative conti- 
nuity in the internal and external circumstances surrounding the journal, we can 
assume that the change from weekly to daily caused this rupture. 

However, some form of continuity reveals itself when examining the tempo- 
ral distributions of each topic. In general, we can distinguish between two types 
of distributions: (i) temporary topics that vary significantly over time and (ii) con- 
stant topics. Figure 2 shows the ten constant topics during the whole time span. 
Although these topics are characterized by continuity, we can still see that 1886 
was a turning point. For some it meant a steep decline and for others a sharp 
incline. 
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Figure 2: The temporal shift in the distribution of constant topics, 1883-1888. Each line 
signifies a single topic, listed in the legend. The value at each point marks the percentage of all 
the content during the relevant period. 


Four topics, making up 36% of all the discussions between 1883 and 1888, 
remained relatively constant throughout the entire time span: topics 11, 13, 
and 20 are related to Jewish and rabbinical discourse and topic 14 is related to 
natural sciences. The consistency of these topics despite the change in publica- 
tion frequency can be explained by the fact that they formed the backbone of 
HaTzfira — as a 19th-century Jewish periodical structured on a heritage of Jewish 
Enlightenment. Beetham discusses three different times that appear in a period- 
ical: “monumental,” “masculine,” and “feminine” times.? “Monumental” time 
includes large historical forces; *masculine" time is “linear time, or the time of 
history and politics, the world of production associated with men"; and “femi- 
nine" time is “the time of reproduction, characterised as repetitive and circular 
rather than linear and progressive." The four topics can be considered the *mon- 
umental time," existing beyond the realms of frequency of publication. 

The next section provides an in-depth analysis of each period by creating two 
different topic models, one for the period 1883-1885 (the weekly era), and another 
for the period 1886-1888 (the daily era). Rather than looking for similarities, this 
section will identify the changes. 


22 Margaret Beetham, “Time: Periodicals and the Time of the Now,” Victorian Periodicals Review 
48, no. 3 (2015): 332. 
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4 Changing Temporalities 


As mentioned previously, some of the topics were constant while others reflect 
the temporal and changing nature of journalistic discourse. Figure 3 depicts the 
constant topics in the first period of time, during which the journal was published 
as a weekly, and Figure 4 depicts the constant topics in the daily era (for further 
information and an explanation of the topic numbers see Tables 1 and 2). What 
stands out in the comparison between both graphs is the rise in number of con- 
stant topics, from eight in the weekly era to eleven in the daily era, and their 
share of all the discussions in the journal, from 70% in the weekly era to 80% in 
the daily era. 

The constant topics mainly relate to the recurring themes of a Hebrew journal 
of the late 19th century: the general paratext of the journal is affiliated with topic 
14 in the weekly era and topic 8 in the daily era; recurring news regarding German 
politics and Jewish affairs are affiliated with topic 1 in the weekly era and topics 
1 and 4 in the daily era; religious discussions appear as part of topic 6 in the 
weekly era and topics 13 and 19 in the daily era; and the heated debates on the 
fate of Jews in Europe appears in topics 5, 16, and 17 in the weekly era and topics 
8 and 16 in the daily era. Topic 20 in the weekly era and 17 in the daily era, with 
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Figure 3: The temporal shift in the distribution of constant topics, 1883-1885. Each line 
signifies a single topic, listed in the legend. The value at each point marks the percentage of 
all the content during the relevant period. 
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Figure 4: The temporal shift in the distribution of constant topics, 1886-1888. Each line 
signifies a single topic, listed in the legend. The value at each point marks the percentage of 
all the content during the relevant period. 


terms such as "air," *blood," and *medicine," are unique to HaTzfira and reflect 
the initial scientific motivation for the publication of the journal. Religion and 
science were mentioned in the previous section as monumental themes, but the 
other six topics would define the journal’s “feminine time": its repetitive, con- 
stant, and reproduced items. 

The aforementioned themes were similarly relevant in both eras, but the 
daily introduced two new themes: literary content (topics 9 and 12) with terms 
such as “my heart,” “his eyes," and “my soul"; and economics (topics 2 and 18), 
with terms such as “ruble,” “liter,” “the price," and “the trade." 

Literary content entered as a separate section of the journal in the third 
installment of the daily journal on April 15, 1886, as Slonimski had earlier forbid- 
den publishing such content.? The first story entitled *Yerushalima" (To Jerusa- 
lem), described the longing of a dying European Jew for the land of Israel.’* The 
topics related to the literary theme reflect two types of content. Topic 9 is related 
to published prose? while topic 12, with its emphasis on verbs in the first person, 


23 HaTzfira, February 4, 1862, 8. 
24 Dober Rabinowitz, “Yerushalima,” HaTzfira, April 15, 1886, 2-3. 
25 Abraham Zuckerman, “Eshet Khayil,” HaTzfira, June 21, 1886, 2-4. 
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reflects personal diaries and poems.’ The integration of the feuilleton and lit- 
erary content on a daily basis helped meet the challenge of producing enough 
content for a daily, as it could be prepared with no connection to current events. 

The major change in content was not literary material but rather econom- 
ics-oriented texts. While the economy was rarely discussed during the weekly 
era, it occupied 5096 of the journal's content from 1886 onwards. As we will see 
in the analysis of the changing topics, the economy became the main feature of 
the daily journal. Two topics related to economic discourse appeared constantly 
throughout the daily era. Topic 2 connects financial terms such as “money,” 
“trade,” and different currencies with political terms such as “government.” 
This topic relates to generalized issues of political economy. For example, a par- 
ticular issue of HaTzfira," of which 33% was affiliated with topic 2, included an 
article on military expenses as its main international news segment, an article 
on taxes as its main internal news segment, and an article on finances of Jewish 
philanthropy as its main segment on Jewish affairs. Topic 18, on the other hand, 
includes commercial terms such as "price" and *rate" and different commodities 
such as “sugar” and "flour." This topic reflects various commercial indices that 
were published on the back page of the newspaper. 

In contrast to the relative similarity between the content of the constant topics 
in both eras, the temporary topics signify a real change in discourse. Figure 5 depicts 
the temporary topics in the weekly era, and Figure 6 depicts the temporary topics 
in the weekly era. 

In general, the weekly era can be characterized by an ongoing discussion 
of European Jewish affairs in connection with world politics. In fact, there is no 
topic that is not connected in some ways to the Jewish people.”® During the first 
year (1883), topics 3, 7, 11, and 18 consist of various issues around libels and rising 
antisemitism. This follows the pogroms of 1881-1883 in south Russia, which had 
a tremendous effect on Eastern European Jewry. During the second year (1884), 
topics 9, 10, 13, and 19 partially reflect a search for a solution for the Jewish 
people; thus, for example, topic 10 connects Moses Montefiore, the Jewish phi- 
lanthropist, with the British colonial regime. During the third year (1885), topics 
2, 12, 14, and 15 are not particularly identifiable but are still connected to general 
Jewish politics and issues. 


26 Israel Saba, “Amarti yesh li Tikva,” HaTzfira, September 29, 1886, 2-3; “Berosh Homiyot,” 
HaTzfira, September 13, 1888, 2. 

27 HaTzfira, September 27, 1886. 

28 Segal and Soffer, *One Journal, One Decade." 
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Figure 5: The temporal shift in the distribution of temporary topics, 1883-1885. The value at 
each point marks the percentage of all the content during the relevant period. 


What is even more striking is the wave-like pattern of the graph in Figure 
5, as the rise of each journalistic theme or topic heralds the decline of another. 
Furthermore, the main transitional topics of HaTzfira reach similar peaks at 
approximately 8% of the journal's discourse during a single half-year period 
before being superseded by another topic. This temporal pattern echoes Franco 
Moretti's conceptualization of shifts in literary genres. In Graphs, Maps, Trees, 
Moretti refers to the temporal and ephemeral nature of genres: “the new form 
makes its appearance to replace an old form that has outlived its artistic useful- 
ness ..., and the decline of a ruling genre seems indeed here to be the necessary 
precondition for its successor’s takeoff."?? Moretti describes the life cycles of 
genres as waves, in which *a rather regular changing of the guard takes place, 
where half a dozen genres quickly leave the scene, as many move in.”*° In terms 


29 Franco Moretti, Graphs, Maps, Trees (London: Verso, 2005), 14. 
30 Moretti, Graphs, Maps, Trees, 18. 
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Figure 6: The temporal shift in the distribution of temporary topics, 1886-1888. Each line 
signifies a single topic, listed in the legend. The value at each point marks the percentage of 
all the content during the relevant period. 


of time, the “present” of the weekly tended to operate like a metronome: some 
topics were continuous and constant, while others consistently appeared and 
disappeared. 

Unlike the wave-like and uniform pattern of news cycles during the weekly 
era, Figure 6 reflects a chaotic change of news items in the daily era. Some topics 
have several peaks, others have relatively long decay periods, and many of them 
overlap. In addition, the peak levels are different for each topic. 

The change from weekly to daily is not only apparent in the pace and rhythm 
of news production but also in the actual content. Of the nine changing topics, 
depicted in Figure 6, only two are not related to the economy (topics 3 and 
11)! This recurring theme reflects a growing commercialization of the Hebrew 
journal, which was part of the capitalistic commodification of time within the 
commercial project of periodical print. As stated by Sommerville, *the adoption 
of weekly and then daily schedules for marketing information began develop- 
ments, the implications of which are still being worked out. Periodicity allowed 
information to become a business, where it had once been a part of personal 


31 Topic 3 discusses the Bulgarian crisis of 1885-1888 and topic 11 is related to the reports from 
Berlin regarding the death of the German emperor Wilhelm I on March 11, 1888. 
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relations."?? This process of commodification escalated in the daily newspaper. 
As the frequency of production increased, more and more space was devoted 
to advertisements. The first daily newspaper, the Daily Courant (1702-1735), for 
example, devoted about one-half of its space to advertisements.” Similarly, 
many of the topics in HaTzfira's daily discourse relate to straightforward jour- 
nalistic finances. Topic 4, for example, includes financial terms from the para- 
text of HaTzfira's front page, emphasizing the commodification of news and the 
newspaper as the agent providing these news items. Topics 10, 15, and 20 relate 
to the advertisements in HaTzfira. Topics 10 and 15 are connected to commodi- 
ties for Jewish holidays, such as fruit from Egypt, Passover flowers, and wine.°* 
Topic 20 is connected to advertisements in Yiddish, which appeared sporadically 
throughout the period.” 

As can be seen, financial issues soon entered into the journalistic texts 
(topics 5, 6 and 7). Topic 5, peaking at the beginning of 1886 — with terms such as 
“guarantee,” “loan,” and “departments,” together with mention of various Euro- 
pean currencies — identifies an early trend immediately following the change 
into daily publication.” During this stage, economic texts were separated from 
other parts of the newspaper, located in the back pages. The full integration of 
financial discourse within other parts of the newspaper occurred in 1887. Topic 
6, for example, reflects the influence of financial reasoning on Jewish polemics 
through the introduction of statistics and primarily economic statistics. The 
paper did this by encouraging readers to send figures and data concerning the 
finances of their own communities in answer to antisemitic accusations of the 
economic inefficiency of the Jewish people.?? Topic 7 reflects a broader and more 
integrated influence of economic discourse on the journal. Articles related to this 
topic combine industrial terms such as "factory," financial terms such as "guar- 
antee,” and political terms such as “emperor” and “Jews.” 


32 Charles J. Sommerville, The News Revolution in England: Cultural Dynamics of Daily Informa- 
tion (Oxford: Oxford University Press, 1996), 161. 

33 Will Slauter, “The Rise of the Newspaper," in Making News: The Political Economy of Journal- 
ism in Britain and America from the Glorious Revolution to the Internet, ed. Richard R. John and 
Jonathan Silberstein-Loeb (Oxford: Oxford University Press, 2015), 31. 

34 HaTzfira, August 19, 1887, 4; HaTzfira, January 23, 1888, 4. 

35 HaTzfira, April 1, 1887, 6-7. 

36 HaTzfira, May 13, 1886, 4. 

37 “Ma-Yif’al Israel," HaTzfira, November 28, 1887, 2-3. 

38 Soffer, “Paper Territory." 
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Table 1: The ten prominent terms in each topic after removing stop words, and a defining title 
based on the full bank of terms, 1883-1885. 


Title 
1 World politics the Jews, Germany, government, Bismarck, England, [written] 
by, newspapers, Germans, in Germany, France 
2 World politics newspapers, Bulgarians, natives, Alexander, students, the 
sick, iron, Kopeks, Jews, Austria 
3 Jewish politics and rising the emperor, Israel, the minister, France, the French, Russia, 
antisemitism the war, the prayer, ruble, in honor of 
4 Jewish politics the Jews, the government, Jews, Russia, Jews [of], newspapers, 
the Jew, to the Jews, government [of], the Christians 
5 The Land of Israel God, Jerusalem, Zion, our brothers, settlement, in Jerusalem, 
the Land of Israel, the Ga'on, to God, houses 
6 Jewish issues Israel, God, trade, in our Land, religion, the charity, our 
brothers, the people of Israel, in the lands of, 20, the money 
7 Antisemitism Israel, Stoecker, the minister, Montefiore, China, France, the 
king, the freedom, libel, army 
8 General paratext God, ruble, rabbi, aforementioned, 10, books, in the 
language of, Torah, franc, of blessed memory 
9 Jewish issues bad, the Talmud, the republic, million, kopeks, doctor, the 
Jews, the rabbis, the order, the movement 
10 The Land of Israel the minister, Montefiore, the parliament, the colonization, 
the English, the council, the Land of Israel, Solomon, 
Germany, synagogue 
11 Jewish politics and rising Israel, aforementioned, the government, Austria, the Christians, 
antisemitism the elected, France, the Talmud, the bitter enemy, The Jew 
12 Jewish commodities Passover, the kosher, flour, the Passover, the enlightenment, 
Italy, the year, Gordon, the nature 
13 Jewish commodities citrons, myrtles, France, China, families, equal, the French, 
trade, purchase, Egypt 
14 Jewish politics the minister, Stoecker, to us, Salisbury, the ministry, 
Rothschild, the lord, in England, citrons, the Holy Land 
15 World politics and more England, the elected, Russia, 25, mit [Yiddish], 50, 
Gladstone, 12, 16, 10 
16 The pogroms the Jews, the government, the army, newspapers, the riots, 
the mob, fire, the trade, Europe, the end 
17 Nahum Sokolov's the English, Russia, war, the address, army, the truth, sea, 
columns letter, the holy, and us 
18  Antisemitism the law, the trial, the judges, the blood, the boy, the libel, the 
witnesses, Esther, libel, Moritz 
19 World politics and more Israel, the parliament, Bismarck, the laws, cult, Lasker, the 
community, Egypt, the ministry, the elected [of] 
20 Science [written] by, the nature, the sun, ruble, the price [of], water, 


between them, the sky, 50, Austria 
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Table 2: The ten prominent terms in each topic after removing stop words, and a defining title 
based on the full bank of terms, 1886-1888. 


Title 

1 World politics Bismarck, the army, newspaper, France, Russia, the 
government, government of, Ashkenaz, Europe, God 

2 Political economy the government, ruble, the trade, the value, kopek, the 
money, franc, trade, Warsaw, in our land 

3 World politics Bulgaria, England, December, temporal, the army, October, 
newspaper, government, Russia, the government 

4 Paratext 100, June, the Jews, May, 50, publication, 10, the loan, 
demanded, abroad 

5 Finance 100, May, 40, journey, July, to God, June, guarantee, Greece, 
one hundred 

6 Jewish statistics providers, 10, November, December, October, Israel, Berlin, 
Paris, traders, 15 

7 Trade and industry 10, the emperor, July, 100, Wilhelm, June, August, God, 50, 
Shimon, Factory 

8 Modern Jewish discourse Israel, our brothers, the readers, the authors, see, in HaTzfira, 
we are, the enlightenment, the religion, articles 

9 Prose his eyes, the woman, loud, answered, called, woman, rabbi, 
will say, stand, sit 

10 Advertisements August, September, Israel, July, p. (acronym), 10, Bulgaria, 
pomegranates [of], pomegranates, number, the Jews 

11 World politics the emperor, April, 10, Berlin, May, Wilhelm, Bismarck, 
Friedrich, Boulanger, newspaper 

12 Prose my heart, | knew, my soul, my master, in me, | said, | saw, my 
brother, and I, I will be able to 

13  Rabbinical discourse God, Israel, rabbi, the congregation, the Torah, our brothers, 
ruble, to God, the deceased, Yitzhak 

14  Paratext kopek, 15, g' (acronym), ruble, 24, florin, saw, [written] by, 
fire, the time 

15 Advertisements God, February, January, Passover, rabbi, 10, Berlin, Warsaw, 
the Passover, flour 

16 Modern Jewish discourse the Jews, God, Israel, our brothers, the government, 
newspapers, about, doctor, the Christians, Jew 

17 Science by, the body, the air, the disease, the blood, the medicine, 
occasionally, physicians, flesh, disease [of] 

18 Economics ruble, liter, the price, the sugar, kopek, this week, 50, the 
traders, pood, the rate 

19  Rabbinical discourse vcu' [et cetera], z"l [in blessed memory], s’, b’, but, g', a’, 

(acronyms) issue, that is, c"a [each and every one] 
20 Advertisements (Yiddish) mit, auch, fuer, im, dem, ein, werden, Fabrik, wie, Freisen 
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5 The Editors’ Perspective on Periodical Time 


At the core of the previous analysis lies the question of periodical time, and more 
specifically its defining role in conceptualizing the present. The editors of HaTz- 
fira were well aware that the shift from weekly production to daily production 
resulted in the need to accelerate news presentation to the readers, which they did 
express in short quarterly bulletins to the readers, published on the front page. 
These bulletins acted as policy declarations on behalf of the editors, express- 
ing their obligation to shortening the time of news production and newspaper 
delivery. The summarizing bulletin of 1886,” for example, states that the paper 
will provide *current news and perceptions, brought by well-known writers that 
expediate all their reports to our readers as soon as they can" (emphasis in the 
original). In addition, it acknowledges the need to fill the pages of the newspa- 
pers, but at the same time expresses the obligation to provide *valuable content 
in each issue."^? 

The need for a more rapid cycle of news did not allow the editors to maintain 
the same themes discussed in the weekly era. The most dynamic and fluctuating 
content available to the editorial board was economic information. Data on com- 
mercial prices, global currencies, and stock market indices changed on a daily 
basis. Although it is hard to believe that most Jewish readers had any need for 
such information, it structured their time. The transformation from Jewish poli- 
tics (the main theme in the weekly era) to economics (the main theme in the daily 
era) was initially a structural change, due to the introduction of more advertise- 
ments and specialized economic and financial columns in the back pages. Over 
time, economic discourse found its way into the general journalistic discussion. 

While topic modeling identifies the growing role of economics within the dis- 
course of HaTzfira, the quarterly bulletins show very little sign of this change. The 
bulletins, which reflect the editors' overview of their work, emphasize continuity 
from earlier periods of the journal, stressing its original scientific and political 
orientation. The “dailiness” of the journal is seen mainly in the editors' success 
in maintaining its traditional nature on a daily basis. However, the editors do 
reflect on a particular change in content, while never mentioning its novelty — 
that is, the introduction of literary columns within the daily HaTzfira. By Septem- 
ber 1887,“ the editors define the journal's mission as being “a political, scientific 


39 HaTzfira, December 28, 1886, 1. 
40 HaTzfira, July 1, 1887, 1. 
41 HaTzfira, September 30, 1887, 1. 
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and literary newspaper" (emphasis not in the original). They seem unaware of the 
new economic hegemony. 

The single exception is the quarterly bulletin of March 30, 1887.7 In it, the 
editors explicitly acknowledge the important role of *reports on the world of com- 
merce and news related to daily work of the people of Israel." In addition, they 
emphasize the value of advertisements, "especially for those dealing with the 
people of this great metropolis [Warsaw]." This unique expression of the commer- 
cialization of the newspaper and its dependence on urbanized readership corre- 
sponds with our analysis based on computational distant reading. Although this 
was not repeated in previous or later bulletins, it reveals the editors' underlying 
awareness of these new circumstances and reality. 


6 From “Masculine Time" to “Feminine Time” 


January 5, 1886, was the date of publication of the last issue of the weekly HaTz- 
fira. Three months later, on 13 April, 1886, the new daily HaTzfira was published, 
with very little apparent change. The subtitle of the newspaper changed from 
“published weekly” to “published daily,” and the length of the newspaper was 
shortened from eight pages to four. Due to the fact that the editorial board, con- 
sisting ofSlonimski and Sokolov, never changed, and that nothing much occurred 
in world affairs during this time of transition, we could assume that the journals 
would remain almost identical. However, this study shows that the transfor- 
mation from weekly to daily had far reaching consequences on the rhythm and 
content of periodical times. The findings of the topic-modeling algorithm, as can 
be seen in the vector analysis (Section 3), provide evidence that the shift in the 
cycle of publication had a substantial effect on the social discourse presented in 
this periodical. The computational analysis shows that the difference between 
the two discourses was almost as great as that between two different newspapers. 

The differences were related not only to the content of the topics but also 
to their structure, cycles, and duration. While in the weekly era we see a wave- 
like pattern of similarly important topics replacing each other, in the daily era we 
see a much more chaotic nature of changing topics as well as a larger number of 
unvarying topics. 


42 HaTzfira, March 30, 1887, 1. 
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As Beetham argues, there is always a dependency between time and money 
within the journalistic discourse.? In the daily, this dependency was radicalized, 
as information became a business.“ In HaTzfira we see several layers of this com- 
modification. The first layer relates to the new social function of the newspaper as 
the provider of information rather than reading material. In order for the daily to 
be relevant and commercially valuable to the readers, this information had to be 
printed before the readers received it through other means. The second layer is an 
outcome of the first layer because the most available, updated, and dynamic type 
of information the journal can publish is commercial and economic information. 
As a result, the orientation of the journal's discourse gradually shifted to econom- 
ics. The third layer of the commodification is the growing proportion of adver- 
tisements in the newspaper. This was an outcome of the combination of several 
factors: the rising costs of daily publication required new sources of funding; the 
daily appearance of the newspaper created an added value for the advertisers, as 
the newspaper provided ongoing exposure to the readership; and the shift of the 
discourse to economics created a suitable environment for the commercialization 
of the printed space. 

Frequency affected not only news cycles and journalistic content, but also the 
genre of journalism. The new social and commercial function of the daily news- 
paper was different from the distant and interpretive role of the weekly. While the 
Jewish weekly was often perceived as a periodical book that could be read years 
after the publication, the daily was designed to affect the immediate, everyday 
environment of the readers. The dailies were designed to grasp the attention of 
a daily commute to work.^ Thomas P. O'Connor, a 19th-century journalist, wrote 
in 1889 that “we live in an age of hurry and of multitudinous newspapers. The 
newspaper is not read in the secrecy and silence of the closet as is the book. It is 
picked up at a railway station, hurried over in a railway carriage, dropped incon- 
tinently when read.”“° Rather than the summary and evaluation of distant events 
provided in the weekly era, the daily dealt with the everyday life and needs of 
the readers. This was manifested in a shift from politics and Jewish discourse to 
economics, as well as the dominant role of advertisements. 

Another perspective of the fulfillment of everyday requirements was the 
introduction of literary columns. Similar to economic discourse, literary dis- 


43 Margaret Beetham, “Towards a Theory of the Periodical as a Publishing Genre,” in Investi- 
gating Victorian Journalism, ed. Laurel Brake, Aled Jones, and Lionel Madden (London: Palgrave 
Macmillan, 1990), 19-32. 
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course was introduced partly as a result of institutional circumstances, since they 
were available and could be prepared in advance to fill the newspaper. However, 
such content related to the new personal perspective of the daily newspaper. As 
reflected in the prominent words of the relevant topics (9 and 12), the vocabulary 
was personal and related to mundane personal domains. Such style was entirely 
different from the dominant scientific style of the weekly era. This could reflect a 
much-needed balance within the daily newspaper between an instrumentalized 
economic style and a personal literary one. 

As Turner observes, varying frequencies of journal publication create diver- 
gent time cycles that relate differently to everyday lives." As we have seen, the 
weekly era was characterized by a metronome-like change of topics. Relatively 
few topics were constant, while others superseded each other in a uniform wave- 
like pattern. In contrast, the daily era was characterized by a larger number of 
constant topics while the dynamic topics reflect chaotic change. This can be 
explained by the need to create a sense of continuity within a daily cycle of jour- 
nalistic production. The daily cycle's lack of journalistic perspective forced the 
creation of stabilized indexing within an unpredictable daily atmosphere. This 
indexing is manifested in the constant topics, which reflect Beetham’s “feminine 
time."^* This finding is not intuitive and stands in contradiction to Sommerville's 
assumption that constant change is reflected within daily production of jour- 
nals.*? While the weekly perspective enabled the production of uniform change, 
the rapid information provided in the daily was structured as if it was uniform 
continuity. This structural uniformity allowed the daily newspaper to contain rel- 
atively few changing topics, which reflect Beetham's *masculine time." 

The daily HaTzfira was not restricted to a single sense of periodical time; it 
encompassed monumental, feminine, and masculine times simultaneously. The 
need to discipline the rapid change of information created structured journalis- 
tic formats with identified spaces within the printed topography, which elevated 
"repetitive and circular" representation of “linear and progressive" information. 
In this way, it challenged the binary distinction between feminine and mascu- 
line times. Periodicals in general, and dailies in particular, were significant in the 
construction of the *present." However, the meaning of this *present" was rarely, 
if ever, constant and coherent. 

The case-study of HaTzfira manifests the advantages of computational tools 
and methods in juxtaposing two areas of interest in current Jewish Studies, the 
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reliance of Jewish societies on communication networks and the significance of 
time in Jewish tradition. 

Due to the diasporic nature of the Jewish society, communication networks 
assumed an exceptional significance: they were the means not only of creating 
an “imagined community” but a central element in maintaining the actual, real, 
community. “The dispersed, centerless Jewish world remained connected and in 
many ways intact into the modern era, based on innovative and effective media 
arrangements and communication strategies," claims Blondheim.” The impor- 
tance of communication has resulted in extensive research on Jewish media in 
the Modern Age. Those same circumstances caused Abraham Joshua Heschel to 
distinguish between the *space-minded man" and the Jew and claim that Judaism 
is a “religion of time aiming at the sanctification of time."?' Consequently, Jewish 
Studies in the last two decades has seen a rising interest in time-related research, 
and in particular diverse forms of temporality in Jewish culture.” 

However, despite the fact that a lot of the work in both areas of interest has 
dealt with new media and digital technology, their analysis remains mostly qualita- 
tive and non-digital. Computational analysis provides new perspectives on each of 
these fields individually and together. 
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Abstract: The article offers an example of an argument-driven data analysis of 120 
petitions issued in 1971-1972 by a Soviet Jewish emigrant from Minsk by the name 
of Ernst Levin (1934-2016). In Levin's case petitions became a principal instru- 
ment of his struggle for emigration from the Soviet Union, which lasted altogether 
582 days. Probing ggmap and ggplot2 packages available for R programming lan- 
guage on historical data, this study intends to visualize and consequently recon- 
struct the way and intensity of Levin's communication with Soviet authorities 
and international organizations. In doing so, it approaches the emigration as a 
process, scaling and visualizing its durability. Displaying and highlighting the 
changes in the tactics of petitioning over time, the presented in the article graphs 
and visualizations allow considering Ernst Levin's emigration efforts from a mul- 
ti-dimensional perspective of political and public actors, places, organizations, 
individual decisions and collective actions. 
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1 Introduction 


In 1968, a Soviet engineer by the name of Ernst Markovich Levin (b. 1934) resolved 
to abandon his USSR citizenship and to immigrate to Israel together with his wife 
Asia and their son Gosha. Just a few years earlier, his life resembled that of the 
typical Soviet intelligent of the time. He was born into and brought up in a family 
of devoted communists.’ A graduate of the Belarusian Polytechnic Institute, one 
of the leading technical higher education institutions in the USSR, Ernst Levin 
made a successful professional career in the building construction industry.’ 
Together with his family he occupied a separate apartment in the very center of 
Minsk, then the capital of the Belarusian Soviet Socialist Republic (BSSR), and 
lived a relatively prosperous life? Notwithstanding appearances, throughout 
the 1960s, Ernst Levin developed a critical attitude towards the idea of social- 
ism, became fascinated by the Jewish culture and learned Hebrew so well that he 
intended to become a Hebrew teacher.’ 

Once he decided to emigrate, Levin eagerly invested his skills and abilities in 
the realization of this aim, as emigration from the USSR was a lengthy and tedious 
process. Potential emigrants had to apply for an exit visa and frequently they 
faced multiple refusals before permission was finally granted, hence they became 
known as “refuseniks.”’ While struggling for the right to emigrate, Levin adopted 


1 A discussion on the relationship between the first and second generations of Soviet Jews and 
their attitude to the “Jewish” and the “Soviet” see Yuri Slezkine, The Jewish Century (Princeton, 
NJ: Princeton University Press, 2019), especially parts 3 and 4. 

2 For the history of the institute (now university) see K.I. Balandin et al., Istoriia Belorusskogo 
natsional’nogo tekhnicheskogo universiteta (History of the Belarusian National Technic Universi- 
ty) (Minsk: BNTU, 2010). 

3 Ernst Levin, I posokh v ruke vashei. Dokumental’nyi memuar 2002 goda (k tridtsatiletiiu iskhoda 
iz SSSR)[And the Staff is in Your Hand. A documented Memoir of 2002 (To the 30" Anniversary of 
Exodus from the USSR)]. (Jerusalem: n.p., 2007), 11-13. 

4 Levin, I posokh, 89-93. FSO, F. 30.45 (Levin), 2-85. Compare to: Ann Komaromi, “Between Two 
Worlds: Late Soviet Jews in Leningrad," East European Jewish Affairs 48 (2018): 23-40, accessed 
January 13, 2022, https://doi.org/10.1080/13501674.2018.1442046. On the teaching and learning 
of the Hebrew language in the USSR see Mark Drachinsky, "A Brief Survey of the History of He- 
brew Teaching in USSR,” in Jewish Culture and Identity in the Soviet Union, ed. Ya'acov Ro'i and 
Avi Beker (New York: NYU Press, 1991), 246—54. 

5 Like many other Soviet migrants, the Levin family experienced multiple refusals, hatred, sus- 
picions, various bureaucratic hurdles, and financial difficulties before procuring an exit visa. 
Larissa Remennick called this condition an “economic, legal and political” limbo. See, Larissa 
Remennick, Russian Jews on Three Continents: Identity, Integration, and Conflict (New Brunswick, 
NJ: Transaction Publishers, 2012), 39. 
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the credo “Not a day without a line" (Ni dnia bez strochki),® and decided to con- 
centrate his efforts on writing and making formal requests (known as petitions) to 
the Soviet authorities and international organizations, thereby hoping to attract 
attention to his case. 

Simultaneously, Ernst Levin meticulously documented the whole process of 
emigration (See Figure 1 for an example of his documentation). He collected an 
extensive archive of his and his family's departure, including the correspond- 
ence, newspaper clippings, address lists, and personal notes, which he was able 
to smuggle out while leaving the USSR. Owing to this careful documentation, the 
main bulk of which is made up of individual and collective petitions, it is now 
possible to reconstruct the whole process in detail." 

In the existing scholarship, the petitions of Soviet citizens have been studied 
mostly from the perspective of social history, which to a significant extent allowed 
historians to revise Soviet history from the standpoint of the ordinary citizens, 
bringing to light their agency, even if this was strongly restricted in the totalitar- 
ian (authoritarian) state.? Yet, a combined qualitative and quantitative analysis of 
Soviet petitions based upon their structural characteristics is still rare. Possible 
reasons for this are the extremely high number and diversity of petitions, which 
made a more precise statistical analysis difficult to handle for historians. Sec- 
ondly, the sources are scattered in different post-soviet archives, some of which 
are still classified or difficult to access. The alternative solution, as this article 
suggests, is to apply data analysis to a small-scale but coherent dataset, where 
the materials are distributed over a specific period of time. 

In what follows, I offer an example of argument-driven digital historical anal- 
ysis of one particular emigration case, combining a close computational reading 
of petitions with a traditional study of sources. I intend to show how the tactics of 
petitioning has changed over time, and how the practices of both resistance and 
adaptation were adopted as instruments in the struggle for emigration. 

My computational analysis is based upon a small dataset derived from Levin's 
petitions and uses the R programming environment, in particular the ggplot2 


6 I define petitions as all petition-like written and oral appeals to the authorities. Compare to 
Hale Yilmaz, "Petitions as a Source in Women's History of the Republican Period," in Women's 
Memory: The Problem of Source, ed. Fatma Türe and Birsen Talay Kesoglu (Newcastle upon Tyne: 
Cambridge Scholars Publishing, 2011), 81. 

7 FSO, F. 30.45 (Levin). The other significant source has been Ernst Levin's recollections. Levin, 
I posokh. 

8 Yilmaz, "Petitions as a Source," 81-83. Compare also to the recent study on petitioning prac- 
tices in late-Salazar Portugal: Duncan Simpson, "Approaching the PIDE ‘From Below’: Petitions, 
Spontaneous Applications and Denunciation Letters to Salazar's Secret Police in 1964," Contem- 
porary European History 30, no. 3 (2021): 398-413, doi:10.1017/S0960777320000612. 
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package, to visualize its characteristics.? By applying R to a relatively small his- 
torical dataset we can better understand emigration as a process, and scale and 
visualize its durability. Moreover, it allows putting an individual case into a mul- 
ti-dimension framework of political and public actors, places, organizations, 
and individual decisions and collective actions. It thus helps to arrive at a more 
nuanced understanding of the practice of petitioning and simultaneously opens 
an opportunity for cautious generalization. The latter is especially rewarding 
when dealing with unevenly represented sources as is the case with regard to the 
history of the Jewish refuseniks in Minsk.'? 

Though petitioning was distinctive for the relationship between the citizens 
and the state throughout the whole Soviet period,” starting from the late 1960s, 
petitions also mirrored the emerging political participation in the USSR.” By 
applying visual and data analysis in this study I will explore the dynamics of 
protest and interaction that arose in Levin's appellations. The position of a poten- 
tial emigrant in Soviet society, who simultaneously was treated as a member and 
an outsider, was an additional factor that greatly affected the character of this 
dynamic. 

The article begins with a brief note on emigration policies in the Soviet Union 
and the BSSR and discusses in some detail Ernst Levin's emigration story so as 


9 The use of R for these purposes is the author's choice, based on her understanding and skills. 
Arguably, a similar result can be achieved by other means. The purpose of this article is not to 
discuss the advantages of the R programming language for historical research but to combine 
methods of traditional source study and data analysis visualization to achieve a more nuanced 
understanding of the history of Jewish emigration from the USSR. 

10 The utility of digital methods for the study of Soviet history, when some archives are still 
closed and the sources are difficult to access, has been discussed in Susan Grunewald, “A Push 
for Digital History in Soviet and Post-Soviet Studies," NYU Jordan Center for the Advanced Study 
of Russia, accessed July 15, 2021, https://jordanrussiacenter.org/news/a-push-for-digital-history- 
for-soviet-and-post-soviet-studies/#.YSjUGogzaUm. 

11 Sheila Fitzpatrick, “Editor’s Introduction: Petitions and Denunciations in Russian and Soviet 
History,” Russian History 24, no. 1-2 (1997): 5-6. See also, in the same issue, Golfo Alexopoulos, 
“The Ritual Lament: A Narrative of Appeal in the 1920s and the 1930s,” 117-29. It is worth noting 
that writing petitions had a long tradition in Russian and subsequently Soviet history, yet its 
premise had changed in the last decades of Soviet rule. 

12 On April 12, 1968, the Presidium of the USSR Supreme Soviet signed the “Decree on the Pro- 
cedure for the Consideration of Proposals, Declarations, and Complaints of Citizens.” Ukaz N 
2534-VII Prezidiuma Verkhovnogo Soveta SSSR o Poriadke rassmotrenia predlozhenii, zaiavlenii 
i zhalob grazhdan, accessed February 13, 2021, http://docs.cntd.ru/document/9012207. On po- 
litical communication see also Stephen White, “Political Communications in the USSR: Letters 
to Party, State and Press," Political Studies 31 (1983): 43-60 and Margareta Mommsen, Hilf mir, 
mein Recht zu finden. Russische Bittschriften von Iwan dem Schreiklichen bis Gorbatschow (Berlin: 
Propyláyen, 1987), 217-57. 
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to relate his petitioning activity to a broader context. After that, it considers peti- 
tions as the main weapon of Ernst Levin's struggle for emigration, and explains 
how the petitions could be turned into a dataset. I then explore and analyze 
various groups of visualizations created in R to probe the distribution of Levin's 
petitions over time, their different addressees and types, as well as the correla- 
tions between them. 


2 Emigration Policies in the Soviet Union 
and the BSSR 


In the 1970s-1980s, 240,000 Jews left the Soviet Union for Israel, the USA, and 
elsewhere.? Although the USSR signed, already on December 10, 1948, the Uni- 
versal Declaration of Human Rights affirming the freedom of movement,” this 
right was not explicitly enshrined in the Soviet Constitution.” Moreover, even in 
the post-Stalin period, emigration from the USSR, if no longer a subject of crimi- 
nal prosecution, was openly disapproved and condemned. Only with the onset of 
perestroika policies in 1987-1989 did free emigration become possible. 

The success of emigration depended on multiple factors, such as interna- 
tional pressure and the goodwill of the local authorities. As a rule, emigration 
was a lengthy and challenging procedure, and the number of those allowed to 
emigrate fluctuated significantly. To leave, one had to apply for an exit visa at 
the local Department for Visas and Registration (OVIR). A substantial percent- 
age of first-time applications was refused, based on different formal reasons. The 
applicants who were not granted permission to emigrate were colloquially labeled 
*refuseniks" (otkazniki in Russian)." Refuseniks stayed in the Soviet Union for, 
sometimes, several years, struggling for the opportunity to emigrate. Often, they 


13 The figures are from Zvi Gitelman, A Century of Ambivalence: The Jews of Russia and the Soviet 
Union, 1981 to the Present, 2nd ed. (Bloomington: Indiana University Press, 2001), 185. 

14 The Universal Declaration of Human Rights, 13.1 and 13.2. 

15 Konstitutsia (Osnovnoi zakon) Soiuza Sovetskikh Sotsialisticheskikh Respublik 5 dekabria 
1936 goda (The Constitution of the Union of Soviet Socialist Republics adopted on December 5, 
1936), accessed March 2, 2021, http://www.hist.msu.ru/ER/Etext/cnst1936.htm. The next Consti- 
tution in the USSR was adopted on October 7, 1977. 

16 See, for instance, the numbers of emigrants by years: “Total Immigration to Israel from the 
Former Soviet Union (1948-present)," accessed February 5, 2021, https://www.jewishvirtualli- 
brary.org/total-immigration-to-israel-from-former-soviet-union. 

17 On the definition of refuseniki as a group see, Mordechai Altshuler, “Who Are the ‘Refuseniks’? 
A Statistical and Demographic Analysis," Soviet Jewish Affairs 18, no. 1 (1988): 3-15. Vladimir (Ze'ev) 
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were deprived of occupation, income, and education and faced condemnations in 
public life and on the pages of Soviet newspapers.'? 

At the same time, from the end of the 1960s, protest activity to allow emigra- 
tion started to gain force and attracted attention in the Soviet Union and interna- 
tionally. As a result of the latter, the so-called Jackson-Vanick Amendment, which 
was designed to support human rights in the USSR and put pressure on Soviet 
economic interests, was adopted by the US Congress in early 1975.'? Another sig- 
nificant initiative was the signing of the Final Helsinki Act in August 1975, which 
by many (at least in the initial stage) was perceived as a step towards democracy 
and civic freedoms in the Soviet Union. 


3 Setting the Context: The Emigration Story 
of the Levin Family 


As was already noted, Ernst Levin was a true Soviet citizen in his youth and 
expressed no particular interest either in Yiddish, the traditional language of the 
local Jewish population, or in Jewish culture.”° The change in his Weltanschauung 
was a result of both the political liberalization of the late 1950s-1960s in the USSR” 
and personal circumstances. As a typical representative of the 1960s generation, 
Ernst Levin experienced the hardship of World War II and Stalinism, yet later also 
the loosening of ideological constraints in the Soviet Union in its prime.? He was 
also affected by the rise of common anti-Semitism in Soviet society and recalled 


Khanin, “The Refusenik Community in Moscow: Social Networks and Models of Identification,” 
East European Jewish Affairs 41, no. 1-2 (2011): 75-88, doi:10.1080/13501674.2011.591661. 

18 "Timeline of the Jewish Movement in the Soviet Union," University of Toronto Libraries, ac- 
cessed October 17, 2020, https://samizdatcollections.library.utoronto.ca/content/timeline-jew- 
ish-movement-soviet-union. 

19 Barbara Martin, “The Sakharov-Medvedev Debate on Détente and Human Rights: From the 
Jackson-Vanik Amendment to the Helsinki Accords," Journal of Cold War Studies 23, no. 3 (2021): 
138-74, accessed January 13, 2022, https://doi.org/10.1162/jcws. a. 01009. 

20 Levin, I posokh, 17. 

21 Partial liberalization, the so-called Khrushchev Thaw, was initiated and held by the First 
Secretary of the Central Committee of the Tsk CPSU Nikita Khrushchev (1894-1971). 

22 On the "generation of the sixties" in the USSR see: Petr Vail and Aleksandr Genis, 60-e. Mir 
sovetskogo cheloveka (Moscow: Corpus, 2013); Vladislav Zubok, Zhivago's Children: The Last 
Russian Intelligentsia (Cambridge, MA: Belknap Press, 2009); for an overview, see also Georgii 
Kas'ianov, Nezgodny: Ukrainskaia inteligentsia v rusi oporu 1960-kh—1980 x rokyv (Kyiv: Lybid’, 
2005), 12-31. 
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in his memoirs cases of anti-Semitic behavior and anti-Semitic expressions attack- 
ing him or his wife.? The BSSR was no exception in this regard; its daily periodi- 
cals, such as Zviazda (Star), regularly condemned Zionism, strongly discouraging 
Belarusian Jews from emigration.” 

It was Israel's victory in the Six-Day Arab-Israeli War in June 1967 that made 
the idea of emigration palpable for Ernst Levin (as well as for many Soviet Jews).?? 
Additional incentives were the domestic political situation in the USSR, Leonid 
Brezhnev's attempted de-liberalization, and increasing pressure on the dissent- 
ing and intelligentsia.”° 

Levin’s wife, Asia Levina (née Rudshtein, b. 1939), and their friends’ circle 
influenced his views greatly too. Asia was born in the small Belarusian town of 
Lyuban,” into a family in which Jewish traditions were cherished and skepticism 
toward Soviet authorities prevailed.’ Among the friends of the Levin family were 
the Minsk theater designer Tsfania-Gedalia Kipnis, a student from Riga by the 
name of Il'ia (later: Eliahu) Valk, and the son of the artist and prisoner of Stalin 
Gulag Mark Zhitnitskii, Isaak — all persuaded Zionists.?? The last of these, Isaak 
Zhitnitskii, one of the first Jewish emigrants from Minsk, provided the Levin 


23 Levin, I posokh, 5 and passim. On late-Soviet nationalities policies toward Jews see, Evgenii 
Kazakov, “V poiskakh ‘sovetskogo evreiskogo’: Poznesovetskaia natsional’naia politika,” Ne- 
prikosnovennyi zapas 6 (2018): 190-215. 

24 See, for instance “Han’ba siianistskim pravakataram” (Shame on Zionist instigators), Zvi- 
azda 17 (January 21, 1972), 3; Safiia Kantar, “Chamu ia ne paedu ü Izrail” (That is why I am not 
going to Israel), Zviazda 33 (February 9, 1972), 3, etc. See also Alexander Friedman, “Antizionis- 
mus und Anti-Masonismus in der Sowjetunion nach dem israelisch-arabischen Sechstagekrieg 
(1967). Der Verschwórungstheoretiker Vladimir Ja. Begun (1929-1989),” in Juden und Geheimnis. 
Interdisziplinäre Annäherungen, ed. Claus Oberhauser (Innsbruck: Innsbruck University Press, 
2015), 137-51. 

25 Gitelman, A Century of Ambivalence, 176-77. The migration intensified not only in the Soviet 
Union but also beyond. See, Dariusz Stola, Kraj bez wyj$cia? Migracje z Polski 1949-1989 (War- 
saw: ISP PAN, 2010), 177-218; Jannis Panagiotidis, The Unchosen Ones: Diaspora, Nation, and 
Migration in Israel and Germany (Bloomington: Indiana University Press, 2019), 193-237. 

26 Levin, I posokh, 15. 

27 With the pre-World War II Jewish population amounting to more than one-third of all the 
town's inhabitants or 35.33 per cent. Source: Vsesoiuznaia perepis'vnaseleniaia 1939 goda (The 
all-Union population census of 1939), accessed March 1, 2021, http://www.demoscope.ru/week- 
ly/ssp/ussr nac 39 ra.php?reg-625. 

28 Interview by the author with Asia Levina, Munich, May 21, 2018, in the personal archive of 
the author. 

29 Eliahu (Eli) Valk had left USSR for Israel in 1971. After the dissolution of the Soviet Union, 
between 1993 and 1996 he became the first ambassador of Israel to the independent Republic of 
Belarus. For the self-published recollections of Isaak Zhitnitskii on his emigration from Minsk 
see, Isaak Zhitnitskii (Itskhak Bar-Zait), “Iskhod iz Minka, 1971 god," Haifa (2014), Evreiskaia 
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Figure 1: A structural organization of BSSR administrative organs dealing with Jewish 
emigration. A scheme drafted by Ernst Levin and translated into English. 
Source: Forschungsstelle Osteuropa (afterward: FSO), F. 30.45 (Levin), 1-24. 
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family with the so-called “vyzov” (invitation or call) necessary for the start of the 
emigration process.?? 

As Minsk Jews began to form a more unified group, they also strove to make 
the memory of the Holocaust part of public discourse.* On March 5, 1972, the 
commemoration of the thirtieth anniversary of the Minsk Ghetto massacre, in 
which 5,000 people were slaughtered, took place. This gathering became one of 
the first collective manifestations organized by the Minsk Jewish Community and 
played a role also in Ernst Levin’s emigration story.” 


4 Petitions as a Weapon and as a Dataset 


It took Ernst Levin and his family 582 days, or 83 weeks and 1 day, to emigrate 
from the Soviet Union.? The invitation arranged by Zhitnitskii arrived by post on 
April 28, 1971. On August 29, 1972, approval for the Levins' visa application was 
granted, though it required three more months to collect and pay the so-called 
“diploma tax,” introduced by the authorities to limit the number of applications.?^ 
Eventually, the family crossed the border with the Polish People's Republic near 
the Belarusian city of Brest on November 30, 1972. 

Writing petitions and appeals became a powerful weapon, which Soviet 
refuseniks used in their struggle for emigration. It simultaneously served the inte- 
gration of the movement and helped to publicize the cause of Jewish emigration 
from the USSR outside its borders. Ernst Levin was one of the first Jewish refuse- 
niks in the BSSR who recognized the power of petitioning. The study of Levin's 
archive together with his published recollections and other auxiliary sources 


Wiki-Entsiklopedia/Everyday Jewish Wiki, accessed November 22, 2021, http://www.ejwiki.org/ 
wiki/Welcome to Everyday Jewish Wiki. 

30 Levin, I posokh, 15; FSO, F. 30.45/ (Levin), fols. 1-3. 

31 Levin, I posokh, 89-93. 

32 Levin, I posokh, 100-101. The massacre took place on March 2, 1942. The history of Jewish 
refuseniks in Minsk was briefly referred to in: Ludmilla Alexeyeva, Soviet Dissent: Contempo- 
rary Movements for National, Religious, and Human Rights (Middletown, CT: Wesleyan University 
Press, 1984), 175-76; Leonard Shroetter, The Last Exodus (Jerusalem: Weidenfeld and Nicolson, 
1974), 272-85. 

33 Author's calculations. 

34 In Levin's case, it amounted to 15,000 rubles, while Ernst Levin's average monthly wage at 
the time was about 200 rubles. Levin, I posokh, 164-65. For more see also Gitelman, A Centu- 
ry of Ambivalence, 183-84; Viktor Dennighaus and Andrei Savin, “‘Kak by Ukaz o evreiiakh ne 
otmeniat', a de faktom ne primeniat' L.I. Brezhnev, razriadka i evreiskaia emigratsia iz SSSR v 
1972-1973 gg.," Rossia XXI vek 1 (2013): 130—59. 
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such as newspaper articles of the time allowed the reconstruction of 121 unique 
appeals, which the aspiring emigrant, alone and together with other refuseniks, 
made between May 1971 and November 1972. These petitions served as a basis for 
asmall dataset, predominantly designed to enrich our understanding of the ways 
and methods in which the refuseniks communicated with the Soviet authorities.” 

The petitions of Soviet citizens were usually written according to a specific 
pattern and included obligatory information such as the full name and address of 
sender and recipient, a date, and typically a reference to a concrete problem which 
should be solved with the assistance of the authorities. They thus represented a 
rather standardized form of communication with the authorities. Nonetheless, 
they varied in their intonation and contained their own unique stylistic features, 
depending on the personality and the position of the supplicant, leaving space 
for the “subjective universe" of the author, as Juliane Fürst calls it.? Along with 
some other categories of Soviet citizens — such as participants in the human rights 
movement” and religious dissidents?? — Jewish activists for emigration operated 
from the margins of Soviet society. Not seen as loyal Soviet citizens on account 
of their activities, they practiced civil obedience - that is, they intended to follow 
the letter of the law in their protest actions — and dared to demand respect for 
their rights.” Simultaneously, as will be shown below through the example of 
Ernst Levin’s petitions, their appeals and the message they contained were far 
more varied, as it is often presented. 


35 Despite Levin's thorough documentation and the author's careful examination of other avail- 
able sources, as well as his recollections, there could be a small number of other petitions that 
he signed. This could potentially affect the correlations found between the different variables. 
It can be argued with a certain degree of confidence, however, that the generalizations made on 
the basis of the collected data are correct, as the analyzed petitions represent the bulk of Levin's 
appeals. 

36 Juliane Fürst, “In Search of Soviet Salvation: Young People Write to the Stalinist Authori- 
ties," Contemporary European History 15, no. 3 (2006): 327. 

37 The human rights movement started to form in the mid-1960s in the USSR, its principal de- 
mand to the Soviet state being that the provisions of its own law would be observed. For the 
history of the human rights movement in the USSR, see, Alexeyeva, Soviet Dissent, 267-398. 

38 On religions dissidents in the USSR, see, Alexeyeva, Soviet Dissent, 201-43. 

39 On the idea and practice of radical civic obedience in the milieu of Russian dissidents and 
its *founding father" Aleksandr Esenin-Vol'pin, see, Benjamin Nathans, "The Dictatorship of 
Reason: Aleksandr Volpin and the Idea of Rights under ‘Developed Socialism,” Slavic Review 
66, no. 4 (Winter 2007): 630-63. 
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5 The First Round of Data Evaluation 
and Limitations 


Levin’s archive, as well as every single petition he signed, contains more infor- 
mation than could be taken into consideration in this study. During a first data 
evaluation round more detailed information on the recipients of Levin’s appeals 
(addresses), which was available in some cases and included in the initial version 
of the dataset, was left out. 

Moreover, responses from the authorities were also traceable in some cases, 
yet their number and the information they contained were insufficient and thus 
did not merit inclusion in the dataset which, as a result, focuses only on the stand- 
point of the sender. 

Similarly, the names, ranks, and positions of the addressees (as three sepa- 
rate variables) were initially collected. Thus, it became apparent that in the OVIR 
(of the Department of Internal Affairs of the Minsk City Executive Committee) 
Ernst Levin mostly communicated with the head of its administration, major 
Anton Gurevich, whom he described as an “extreme anti-Semite."^? The ques- 
tion of why refuseniks appealed to a specific functionary, and with what kind of 
message and at what moment, requires further consideration and could add to 
the history of Jewish emigration from the BSSR. Yet this question is beyond the 
scope of the present article. 

In order to contextualize the analysis of Levin’s petitions, an auxiliary dataset 
was also created with altogether 81 addresses of Jewish activists for emigration in 
Minsk in 1971-1972. As was already mentioned, in order to sign the petition (which 
could not be sent anonymously), one had to indicate his or her full address includ- 
ing the number of the apartment. These addresses were plotted on a map, using the 
ggmap package, which extends the ggplot2 package for maps (see Figure 2). 

It is worth mentioning that unlike many other post-Soviet cities, which under- 
went a massive renaming after the collapse of the USSR, the names of Minsk streets 
and the city structure have not changed greatly since the 1960s, which allows using 
contemporary maps for historical research.“ Still, some changes did occur. In the 
1970s, the post-war “socialist” reconstruction of the city of Minsk was still ongoing.” 


40 F. 30.45 (Levin), 124. 

41 Compare to: Andrei Savin, “U Gorbacheva i El'tsina ne bylo politicheskoi voli. Pochemu 
Rossia ne izbavilas' ot sovetskikh nazvanii gorodov i ulits," Lenta.ru, June 7, 2020, accessed No- 
vember 29, 2021, https://lenta.ru/articles/2020/07/07/savin, 2/. 

42 On the post-World War II socialist building of the city of Minsk see Thomas Bohn, Minsk — 
die Musterstadt des Sozialismus. Stadtplanung und Urbanisierung in der Sowjetunion nach 1945 
(Cologne: Bóhlau, 2008). 
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Figure 2: A distribution of petitioners across the city of Minsk in 1971-1972. The bright-red 
dots indicate that more than one appellant was residing at one address, so families signed the 
petitions together. 


The construction of new streets and city districts led to the disappearance or renam- 
ing of old street names, which in some cases necessitated a reconstruction of the 
contemporary street names.? 


43 For instance the MOPR street was renamed Kalinina street in 1946, and in 1961 it was again 
renamed as Kommunisticheskaia street. Yet the MOPR lane on which one of the appellants resid- 
ed (a side street of the former MOPR street) existed until the 1980s. A few old houses and an old 
biscuit factory stood on the lane but disappeared when the buildings were ruined. In this case I 
had to define an approximate location for the former lane. Google Maps still does not cover every 
street and object in Minsk, so it has to be double-checked. 
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As becomes apparent from the map (Figure 2), most of the aspiring emigrants 
resided in the center of the city and were located in clusters. The former can be 
explained by the distribution of housing recourses, which often depended on 
the individual's position in the Soviet power hierarchy, characteristic of Soviet 
cities.” As in the case of Ernst Levin, who inherited his centrally located apart- 
ment from his communist parents, some of the supplicants stemmed from the 
families of party functionaries or occupied a high-ranked position. Technical 
intelligentsia and professionals settled in newly built apartment houses which 
arose along the main Lenin avenue at a certain distance from the center.^ Among 
the signatures were those of engineers, doctors, artists, and former Soviet army 
officers.^* The location in clusters shows that the strengthening of the Jewish 
community, which was imperative for collective petitioning, had a certain corre- 
lation with the place of residence. 


6 Second Round of Data Evaluation 


After the first round of data evaluation, it was decided to limit the inquiry to the 

following questions: 

1. What was the temporal distribution of Ernst Levin's petitions between April 
28, 1971 and November 30, 1972? 

2. What were the main addressees of Levin's appellations and where were they 
located? Or, more precisely: was he mostly communicating with the republi- 
can (BSSR) authorities in Minsk or central (all-Union) authorities in Moscow? 

3. Was he mostly acting alone or did he collaborate with other refuseniks? And 
is there a correlation between individually or collectively signed petitions and 
their recipients? 

4. What were the main channels and the main ways of communication? 


The initial dataset was manually revised with a view to the data's relevance for 
these research questions, and the following seven variables were retained: (1) ID, 
(2) Date, (3) Mode of Communication (with the subsequent sub-categories: personal 


44 Bohn, “Minsk,” 6. 

45 For this information I am grateful to Artur Klinau. See also his book devoted to the (post-)social- 
ist landscape of Minsk: Artur Klinau, Minsk — Sonnenstadt der Träume (Berlin: Suhrkamp, 2011). 
46 Thus, active refuseniks of the time included decorated retired Red Army officers Colonel Lev 
Ovsischer and Lieutenant Colonel Naum Alschansky, wife of professor of history Nikolai Poletika 
Tamara, medical doctor Iakov Shulz, and artist Tsfania Kipnis. FSO, F. 30.45, Bl. 110, 130, 148. 
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visit, telephone call, or written letter), (4) Addressee/Organization, (5) Address/ 
Destination, (6) Type (individual or collective), and (7) Form (with sub-categories: 
inquiry, appeal, request, and complaint). (See also Dataset Summary, Figure 3). Each 
variable corresponds to a unique unit of information, appropriate for descriptive 
data analysis. Each appeal received a unique ID and was related to the date. Since 
several appeals could be made on one day, the same date can reappear several 
times in the dataset. 

If the definition for most variables is quite obvious, the seventh variable - the 
*Form of Communication" requires a short explanation. The accessible corre- 
spondence and conversations, including those about which Levin reported, vary 
in their tone and message. After careful examination of the applications as well as 
the context in which they arose, I have sub-categorized them as follows.^ Appeals 
have been classified as the applications which imply a call, a clear message *let us 
go home" (otpustite nas domoi) or *help us go home." Requests normally contain 
expressions like *we ask" (my prosim) or *we require" (my trebuem). Inquiries are 
formal requests for information, usually concerning the status of a visa application. 
Complaints (zhaloba) are typically applications to the higher authorities or pros- 
ecutorial bodies to attract attention to the injustice happening “on the ground," 
objecting to it, and demanding justice. 

The corresponding information was loaded into an MS Excel spreadsheet 
which then was converted into a csv-format table, compatible with R. Ernst Levin 
addressed 47 Soviet and international organizations (and individuals), located 
in ten different cities in the Soviet Union and abroad (see also the summary of 
the dataset, Figure 3). Most of Levin's applications appeared in a written mode 
and were signed individually (74 and 83 of 121 respectively), most frequently he 
communicated with the local authorities in Minsk (77 applications), and the main 
addressee of his applications was the BSSR Ministry of Internal Affairs (24 appli- 
cations). As the dataset is small and the results somewhat foreseeable, such a 
preliminary understanding could be drawn without the help of digital means, 
yet even at this stage, the analysis makes it easier to comprehend some patterns. 

The seven variables in the dataset formed the basis for several visualiza- 
tions that were used as an exploratory tool which served to sharpen, reframe, 
and support the analysis of Levin's tactics of interaction with the authorities as 
reflected in his petitions. 


47 As mentioned, these categories may require reconsideration or extension during further in- 
vestigation. 
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ID DATE MODE ADDRESSEE . ORGANISATION DESTINATION 
Min. : 1.00 01.03.1972: 6 Personal :33 BSSR Ministry of Internal Affairs:24 Minsk :77 
Ist Qu.: 30.75 02.05.1972: 5 Telephone:14 OVIR Ae Moscow 227 
Median : 60.50 03.08.1971: 4 written :74 BSSR Minister of Internal Affairs: 9 Jerusalem : 5 
Mean : 60.50 03.11.1971: 4 Minsk City Procurator :6 New York 3 
3rd Qu.: 90.25 30.09.1971: 4 TsK, Belarusian Communist Party : 5 Paris f 
Max. :121.00 31.09.1971: 4 Chairman, USSR Supreme Soviet 2 Washington: 2 
NA's id Cother) :94 Cother) :62 (other) 5 

TYPE FORM 
Collective:38 Appeal 139 


Individual:83  Complaint:27 
Inquiry :38 


Request :17 


Figure 3: Dataset "petitions" summary. 


7 Visualizing Petitions in R 


R is a high-level programming language created for, primarily, data analysis and 
visualization and contains a vast collection of libraries suitable for almost every 
kind of historical analysis.“ For this article, which relies on visualizations of a 
small dataset, R seems to be an appropriate tool for exploring both the quantita- 
tive and the qualitative dimensions of petitioning practices. The visualizations 
have been created with the help of RStudio, a programming interface or IDE - 
Integrated Development Environment - for R, which allows a relatively simple 
and user-friendly application of R.*? I used ggplot2, one of the available tidyverse 
packages for R, which is specifically developed for creating graphs and which 
primarily bases every graph on three components - a dataset, a system of coordi- 
nates, and geoms (visual marks representing data points). In the process of data 
analysis, four groups of visualizations were created, which allowed for the evalu- 
ation and exploration of correlations between the defined variables. 

The first group of visualizations (Figure 4-7) looks into the application process 
and its development over time, showing the number of petitions (sub-divided 


48 The advantages of R for historical research have been discussed in: Sharon Howarth, “Five 
Reasons for Historians to Learn R," accessed July 17, 2021, https://www.dataquest.io/blog/five- 
reasons-for-historians-to-learn-r/, and Lincoln Mullen, “Digital History Methods in R,” accessed 
July 17, 2021. 

49 Mullen, “Digital History Methods." 
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according to the corresponding variables in different graphs) per month over a 
period of 20 months starting from April 1971 and ending in November 1972. 

Figure 4 displays the applications grouped by the mode of communication 
on a timeline. As it looks, the communication was the most intensive between 
August 1971 and April 1972, and it especially intensified in October- Novem- 
ber 1971 and the spring of 1972, while written applications prevailed. Figure 5 
demonstrates that most of the collective applications were issued in the same 
period (fall 1971 and spring 1972). Notably, with the approval of his exit visa 
application, Ernst Levin did not completely refrain from sending petitions (as 
he still had to clear the issue with the “diploma tax") but preferred to do it 
individually. Figure 6 adds the destination of the application to this representa- 
tion, the most colorful part of which is the period between February and May 
1972, when the applications were sent not only to Minsk and Moscow but also 
to Tel Aviv, Jerusalem, Washington, New York, Paris, London, and Rome. This 
period coincides with the rise of collective applications. Figure 7 looks closer 
at the forms of applications, and, somewhat surprisingly, it indicates that com- 
plaints were more distinctive for the initial phase of the emigration process 
(June- October 1971). They were subsequently equaled and even outweighed by 
appeals. 


Jt Mode of Communication 
i * Personal 


ID 


fl i * Telephone 


] * written 


Jun-1971 
Jul-1971 
1 
Sep-197 
1 
1 
1 


Maài-1971 ^ 


Figure 4: Distribution of petitions on a timeline by the mode of communication. 
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Figure 7: Distribution of petitions on a timeline by the form of application. 


The second group of graphs (Figures 8-10) focus on the addressed organiza- 
tions and officials and their correlation with the form, mode, and type of applica- 
tion. All three graphs make clear that the BSSR Ministry of Internal Affairs (Minis- 
terstvo Vnutrennikh Del BSSR) was the most regularly approached organization, 
with second and third places taken by OVIR (the Department for Visas and Regis- 
tration) and the BSSR Minister of Internal Affairs Aleksei Klimovskoi respectively. 
Figure 8 shows that Levin approached the Ministry and OVIR mostly with inquir- 
ies. With complaints (for instance, the cases in which he received no answer from 
the subordinate authorities or he believed he was treated unlawfully) he addressed 
the Minister himself, the Procurator General of the USSR Roman Rudenko, or 
the Minsk City Procurator Leonid Dedkov. Occasionally he also approached the 
Central Committee of the Belarusian Communist Party and the CPSU. Generally, 
appeals were mostly addressed to the higher authorities and even more often to 
the international organizations, such as the United Nations (UN). 

Figure 9 demonstrates that the BSSR Ministry of Internal Affairs was appro- 
ached by all modes of communication, yet mostly personally and by telephone 
(arguably also due to Ernst Levin's place of residence, as his apartment was located 
just opposite the Ministry and telephone calls to Moscow or abroad were much more 
expensive). Both collective and individual applications were addressed to the Min- 
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Figure 8: Forms of application by addressed organizations. 
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Figure 9: Mode of communication by addressed organizations. 
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Figure 10: Type of application by addressed organizations. 
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istry, though individual applications prevailed. And whereas local authorities were 
more often approached individually (Figure 10), collective applications were pre- 
dominantly directed to international organizations. 

Figures 11 and 12 represent an attempt to combine (and double-check) two 
previous questions regarding the correlation between time and the addressees 
of the applications and the way they were approached. The graphs confirm that 
during the period between August 1971 and February 1972 Levin had been most 
intensively communicating with the local authorities, after which he started to 
summon more international attention. This shift in communication practices 
was very likely motivated by the developments in Soviet international politics. 
On May 22-30, 1972, US President Richard Nixon visited Moscow. In January and 
June of the same year, the USSR hosted the newly elected UN Secretary-General 
Kurt Waldheim. Soviet dissidents and refuseniks alike anticipated them with high 
hopes.” Thus it appears that both collaboration with other refuseniks and inter- 
national attention were instrumental in advancing his case. 

The final pair of graphs (Figures 13 and 14) focuses on the destination and 
asks about the forms and types of applications sent by Levin. Minsk (which dom- 
inated considerably, with 77 appeals) and Moscow (27 appeals), the locations of 
the republican and the all-Union authorities respectively, were the main destina- 
tions for Levin petitions. All forms of applications, signed both individually and 
collectively, were sent to these destinations. There were certainly more individual 
applications (approximately 69.196 of the total number), but the number of the 
collective ones (see Figure 5) was still significant, though it drops during the last 
three months, when the Levins were getting ready to depart. 


8 Conclusion 


The ultimately successful outcome of Levin's emigration attempt was, it seems, 
a consequence of both personal initiative and agency, as well as developments 
in international politics, in particular the relaxation of Soviet-US relations. The 
Levins were lucky to hit the first big wave of emigration and as Levin's case was 
among the first in Soviet Belarus, it attracted the attention of Israeli and Western 
media and was closely observed and supported from abroad.” At first glance, 


50 The immediate effect of these visits was rather opposite: the potential “disturbers” were threat- 
ened or even preventively detained. *Knikson," KhTS 26 (July 5, 1972). 

51 See, for instance, Nitzon, “Third Wave of Arrests: Minsk Jews Told ‘No Weeping,” Jewish Ob- 
server and Middle East Review, March 10, 1972; Levin, I posokh, 97, 148-55. 
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Figure 13: Destination by the form of application. 


then, Levin's case may be treated as typical or even not particularly remarkable 
within the history of Soviet Jewish emigration. His emigration was relatively quick 
and unproblematic; he did not face criminal charges and was not persecuted. 
Although time and again supporting his counterparts, Ernst Levin did not have 
larger political aims; his primary objective was his and his family's emigration. 

And yet the emigration case of Ernst Levin is unique, because of his writing 
and organizational talent, his ability to formulate his petitions strictly to the 
letter of the relevant Soviet legislation, and his facility for adapting his appeals 
depending on the responses of the authorities. All this led him and his family 
towards successful emigration, which otherwise could have been a much more 
difficult and lengthy process. 

Analyzing and visualizing his petitions with R reveals the specificity of 
Levin's case on various levels and allows for some generalizations. This becomes 
immediately apparent in answering the first question about the distribution of 
petitions over time, where visualization is instrumental in revealing patterns that 
are not necessarily obvious from an initial study of the petitions themselves. The 
distribution of petitions on a timeline serves to underline the sustainability of his 
effort over the period and helps explain the tactics that Levin used to optimize 
his emigration strategy. He switched from a more demanding line of communica- 
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Figure 14: Destination by the type of application. 


tion with authorities to a more moderate one, collaborated with other refuseniks, 
and appealed for international support while reducing his communication to a 
minimum during the last three months of his stay in the USSR. Furthermore, he 
combined different forms of appeals depending on the context and, presumably, 
responses he obtained from the authorities. 

The number and frequency of appeals that Levin made to the Belarusian 
authorities was remarkable. Even though the final decision about the status of 
his emigration could have been taken in Moscow, the main communication hap- 
pened inside the BSSR. This may hint at the meaning and specificity of the emi- 
gration process as it took place in the Soviet periphery, though more data display- 
ing the cases of other refuseniks is needed to confirm this assumption. Likewise, 
the graphic display of Ernst Levin's petitions to the Soviet authorities and inter- 
national organizations brings to light the emigration efforts on the periphery of 
the Soviet Union during its initial phase (in the early 1970s). Even if these efforts, 
in this case in the Jewish community in Minsk, were most probably inspired by 
Jewish activity in Moscow, Leningrad, Kyiv, Kharkiv, Riga, and other big cities 
across the USSR, at this moment they were largely isolated. 

The data visualizations highlight two important periods of intensification of 
Ernst Levin's struggle for emigration — fall 1971 and spring 1972, which concurred 
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with an increase in refusenik activity in Minsk. In the fall of 1971 it resulted in 
the approval of the application for several Belarusian Jewish families.?? In the 
spring of 1972 the first collective public event, commemorating the victims of the 
Holocaust in Minsk, took place. The Jewish struggle for emigration in Minsk now 
reached a new level, as emigration efforts and memory work collided and created 
new momentum for the former. 

To some extent, addressing the authorities was already an act that embod- 
ied change, as a common citizen transformed into a political subject with his 
or her agency becoming apparent.? In this way, he or she entered into relations 
with state institutions as an equal and full (at least outwardly) subject. Ironically 
enough, subjects become citizens at the moment they decide to abandon their 
citizenship. The visualizations display this emerging agency in detail. 

By working with a small dataset we can zoom in from the general category 
of Soviet Jews, and more precisely of Soviet refuseniks, to the local community 
where the individual decision of emigration, as well as individual effort and per- 
sonal connections within and outside the Jewish community, becomes visible. 
In this regard, a further study of other refusenik groups across the Soviet Union 
could yield a fruitful perspective for comparison. 

The use of R in this study is rather arbitrary as it can arguably be substituted 
by tools available in other programming languages, especially Python. Neverthe- 
less, R proved to be a handy tool for the analysis and visualization of the small 
dataset, in which time and effort expenditures are balanced and proportional 
to the results. Importantly, using visualizations was necessary but is not suffi- 
cient for a full understanding of the emigration process; to achieve the latter a 
combination of digital and traditional methods remains necessary, through an 
approach where computational and human readings complement each other. 

This study demonstrates, how a relatively small amount of data can be ana- 
lyzed and explored to yield new results. Especially in Jewish Studies with its great 
variety of languages and cultures spanning many centuries and regions, atten- 
tion to “small” histories can not only enrich our existing knowledge but also help 
to reframe and revise existing research. Thus, as the story of Ernst Levin and 
its digital interpretation shows, much potential remains to further explore the 
history of Eastern European Jewry, and the Jewish movement for emigration from 
the USSR in particular. 


52 Levin, I Posokh, 59. 

53 Compare to Fürst, “In Search of Salvation," 328. 

54 See also Gerben Zaagsma, “#DHJewish - Jewish Studies in the Digital Age," Medaon - Mag- 
azin für jüdisches Leben in Forschung und Bildung 12, no. 23 (2018): 1-11. 
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Luigi Bambaci 
Digitizing Kennicott's Collation 
of the Hebrew Bible 


Experiences of Encoding and of Computer-assisted 
Stemmatic Analysis 


Abstract: This article describes the creation of a rule-based parser for digitiz- 
ing Kennicott's collation of the Hebrew Bible (Section 1). We will illustrate how 
it is possible to exploit the tree-like structure inherent in the critical apparatus 
of Kennicott's collation in order to generate XML code automatically via a con- 
text-free grammar (Section 2). Finally, we will present an experiment of com- 
puter-assisted stemmatic analysis of the manuscript tradition of the book of 
Qohelet, which was made possible by encoding (Section 3). Both the digitization 
of Kennicott's apparatus and the stemmatic analysis of the textual tradition are 
part of an ongoing project devoted to the preparation of a born-digital eclectic 
edition of the book of Qohelet. 


Keywords: Hebrew manuscripts, textual history of the Hebrew Bible, text encod- 
ing, computer-assisted stemmatology, Digital Humanities 


1 Introduction 


The collations of Kennicott (1776, 1778) (K) and De Rossi (1788, 1798) (DR) represent 
to this day our main source of information about the textual history of the Hebrew 
Bible (HB) from the late Middle Ages to the first centuries after the invention of 
printing. 

Kand DR gathered an enormous number of variant readings of the HB consult- 
ing thousands of textual witnesses, both manuscripts and printed editions. The 
work of K is particularly extensive: according to Barthélemy, his critical apparatus 


Note: We would like to thank the researchers of the Italian National Council of Research, in par- 
ticular the members of the Laboratory of Collaborative and Cooperative Philology (CoPhiLab) of 
the Institute of Computational Linguistics “A. Zampolli" of Pisa, Angelo Mario Del Grosso and 
Riccardo Del Gratta. A special thanks to Federico Boschetti from The Venice Centre for Digital and 
Public Humanities of Università Ca' Foscari of Venice for his support and collaboration. The soft- 
ware component mentioned in Section 2.4 has been designed by him. 


3 Open Access. © 2022 Luigi Bambaci, published by De Gruyter. KOEBAAI] This work is licensed under the 
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
https://doi.org/10.1515/9783110744828-014 
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contains something like 1,500,000 pieces of textual information,’ collected from 
approximately 600 witnesses.? To this corpus DR added new data, collating more 
than 500 additional manuscripts and 200 printed editions.? 

Until now, the two collations have been used primarily to compile critical 
editions and textual commentaries. In some studies, the data provided by the 
two 18-century collators are exploited more systematically for investigations of 
various kinds, from textual history to textual criticism.* Managing such a large 
amount of material efficiently is clearly difficult and cannot in fact be accom- 
plished except by limiting oneself to relatively small samples. Even with these, 
however, it is often necessary to resort to mechanical assistance to extract infor- 
mation relevant for research. 

Thus, the digitization of critical apparatus would represent for HB philol- 
ogists an important advance, since it would permit a huge variety of analyses, 
both qualitative and quantitative, with the aid of the computer. 

Proprietary languages or software that are normally used can, however, 
limit data exchange and hence prevent the possibility of verifying both data and 
methods adopted. 

The markup language promoted by the Text Encoding Initiative (TEI) is an 
optimal solution in this respect. It permits us to query critical apparatuses and to 
extract information in an efficient way, whatever the sample size to be examined. 
TEI international standards also allow scholars to exchange data easily, which 
opens up the possibility not only to control results and methods more effectively, 
but also to reuse the data for further research. 

In the following sections, we propose to take a stab at a concrete implemen- 
tation of all of this, using as our object of study a small book of the HB, namely, 
Qohelet (Q).° We will describe the procedures which enabled us to encode the 
critical apparatus of Q as printed by K, and we will thereafter demonstrate how 


1 D. Barthélemy, Critique textuelle de l'Ancien Testament, 1. Josué-Esther, vol. 1 of Orbis Biblicus 
et Orientalis 50 (Fribourg and Góttingen: Éditions Universitaires and Vandenhoeck & Ruprecht, 
1982), 28 ff. 

2 E. Tov, Textual Criticism of the Hebrew Bible, 3rd ed. (Minneapolis: Fortress Press, 2012), 37. 

3 G.B. De Rossi, Variae lectiones Veteris Testamenti (Parma: Ex regio typographeo, 1784-1788), 
1:xlvi-i. 

4 See in particular the works of M. Cohen and J. Penkower and the studies in stemmatology 
quoted in Section 5. 

5 Accessed February 26, 2021, https://tei-c.org/. 

6 B. Kennicott, Vetus Testamentum Hebraicum cum variis lectionibus, vol. 2 (Oxford: Clarendon, 
1778), 549-61. 
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we processed the encoded data to carry out a special kind of quantitative inquiry, 
namely a computer-assisted stemmatic analysis." 


2 Encoding Through a Parser 


Encoding a critical apparatus of printed editions or collations is a complex task. 
Critical apparatuses are designed to express maximum content within minimum 
space, so the data they contain are highly information-dense. As a consequence, 
such data need to be encoded in a fine-grained manner, employing a rich set of 
markers suitable for making the function of each apparatus component explicit 
for the machine. This makes manual encoding a time-consuming and error-prone 
enterprise. 

However, when the language of a critical apparatus is sufficiently formalized, 
it is possible to automate the encoding operation by means of a parser, that is, 
a software capable of analyzing formal languages and thus of recognizing the 
various parts of which a critical apparatus is composed. 

The method we adopted to obtain a compliant XML-TEI encoding from a copy 
in .pdf of k’s printed apparatus (8 2.1) consisted of four phases described in 
detail in this section: (1) digitization of the original source through optical char- 
acter recognition technology (OCR, 8 2.2); (2) creation of a context-free grammar 
(crc) for describing the language of the critical apparatus (§ 2.3); (3) implemen- 
tation of a general exporter to produce XML code (8 2.4); (4) composition of XSL-T 
stylesheets to convert the XML into XML-TEI ($ 2.5).? 


2.1 Kennicott's Critical Apparatus 


K employs a rigorous annotational system to record variant readings. K's appa- 
ratus, unlike that of DR, strongly minimizes the use of natural language: each 
entry is organized within a structure which lists the apparatus components in a 
pre-established order by utilizing an unambiguous terminology and a finite set of 


7 The data from k’s collation taken into consideration here as well as the tools used to process 
them - from the context-free grammar to XSL-T stylesheets - can be found on our Github reposi- 
tory: https://github.com/LuigiBambaci/Kennicott. 

8 For more technical details about the implementation of the parser we refer to our recently pub- 
lished article L. Bambaci, “Critical Apparatus as Domain Specific Languages: A Rule-Based Pars- 
er for Encoding an Eighteenth-Century Collation of Hebrew Manuscripts," International Journal 
of Information Science and Technology 5, no. 1 (2021): 22-33. 
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typographical conventions. This feature makes k’s apparatus highly suitable for 
automated processing through user-defined rules, as we are about to show. 

Let us begin by examining a few examples of K's apparatus taken from Q 1:1 
(Figure 1). As can be seen, the variants are listed by verses, which are marked by 
numbers followed by dots (‘1.’, ‘2.’ etc.). Within a verse, one finds the apparatus 
entries, each delimited by a dot and a long white space.? An apparatus entry rep- 
resents a place of variation, that is, a place in the text for which variant readings 
are attested. For example, the following entry 


trova 107, 109, 152 — sup. ras. 139 — mbwrva mmm 76. 


means that there exist three readings of the word in the reference text (the lemma 
nbw) as printed by K (the edition of E. van der Hooght, Amsterdam 1705): one 
with scriptio plena (avr), one written over an erasure (sup. ras.), and the third 
with scriptio plena and the addition of a noun (trowrva mmm). The three groups 
of witnesses — expressed here as numerical sigla — have therefore three different 
readings, each separated by a long horizontal line (‘—’). 

More complex cases can occur, as in the following entry: 


11 57, 100, 260; forte 141. 


Here, the annotation means that witnesses 57, 100, 260 have ‘tt instead of 11 of 
the reference text, and that the same is attested also in 141, but as a probable first- 
hand reading (forte). In this case, therefore, there is only a single group of wit- 
nesses with reading 77; in the third witness, however, the reading has probably 
undergone correction by a second scribe, who changed ‘7 into 11 following the 
reference text. To separate witnesses containing particular readings of this sort, K 
uses semicolons or, less frequently, commas. 

Thus, K’s apparatus may be divided into three main units: apparatus entries, 
reading groups, and individual readings. The basic unit is the individual reading: 
here we find Hebrew words, witness sigla and standard Latin annotations that 
specify typographical details (e.g. lit. majorib.), the copyist's hand (primo/forte, 
nunc), and other phenomena. Witnesses sharing readings are listed within 
larger units, namely the reading groups. 


9 For the sake of simplicity, these are encoded as tabulations in the transcription contained in 
the . txt file (see 8 2.2). 

10 A description of K's apparatus can be found in the introduction of the first volume of the 
collation, see B. Kennicott, Vetus Testamentum Hebraicum cum variis lectionibus, vol. 1 (Oxford: 
Clarendon, 1776), i-iv, 68 ff. 
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These three elements — apparatus entry, reading group, and individual reading - 
must always be present for each verse. Other elements are, by contrast, optional, for 
example the lemma: when the link to the reference text is rather immediate, the 
lemma is omitted altogether, as in the examples just illustrated; in cases, however, 
where there is no such immediacy, the lemma is explicitly reported at the beginning 
of an apparatus entry (e.g. at verse 3 in Figure 1), followed, in case of ambiguity, by 
the number specifying the occurrence of the word (1°, ‘2° etc.). 

Other phenomena can occur, such as marginal readings, lacunae, transpo- 
sitions, etc., each of which corresponds to a precise type of annotation in the 
apparatus. 

K'slanguage is, therefore, highly formalized: each element is easily definable 
by the ‘type’ it belongs to (alphabetic character, numeral, special symbol) and by 
the position it occupies within the overall structure. As we shall see, these two 
features — string class and syntax - are sufficient to instruct a machine to iden- 
tify automatically the apparatus components and to assign to them a function, 
without recourse to direct user intervention to render such information explicit. 


2.2 Optical Character Recognition 


The first step in making the data machine-readable is digitization by means 
of OCR software. For this purpose we acquired a .pdf version of k’s collation 
(Figure 1a), which is freely available on platforms such as Google Books and 
Archive, and then we used the software Tesseract” to obtain a digitized version 
of the critical apparatus.” 

K's apparatus is particularly difficult to digitize because of the presence of 
two different alphabets (Latin and Hebrew) and of special symbols, and because 
ofthe need to retain most typographical details of the printed source such as new- 
lines and tabulations, which are crucial for parsing. The output, therefore, needs 
to be corrected through a careful phase of pre-processing. This phase, however, 
can be semi-automated by using a CFG: since a CFG, as we will explain later 
(8 2.3), permits us to check whether a given language is valid according to a set of 
pre-defined formation rules, it becomes possible to detect automatically certain 
types of errors produced by OCR processing, such as those concerning misspell- 
ings of technical terms and incorrect segmentations of the apparatus structure. 


11 Accessed February 26, 2021, https: //tesseract-ocr.github.io/. 
12 The reference text printed above the critical apparatus was not digitized. 
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This way, manual correction of the OCR results can be significally improved, as we 
will discuss in Section 4. 

After revising the results produced by the ocr software, we obtained a 
.txt file containing the transcription of the original . pdf source (Figure 1b). 
This file was the one we used for our parsing analysis, as related in the follow- 
ing sections. 


t. ronp man lit. majorlb. 4,109. — "23 vor major, etor- l- "NPN lit. majorib. 4, 109. "0! vox major, et 


mata; 136,139 — non majors 1,2, 3, 14,31, 57,67, 82,89,93, nata ; 136, 139 — non major ; 1, 2, 3, 14, 31, 57, 67, 82 
c g 

99, 100, 110, 119, 128, 130, 141, 144, 231, 237, 239, 270,289. 89» 93, 99, 100, 110, 119, 128, 130, 141, 144, 231, 237, 

P5np—rmmp ir. TM 57,100, 160; forte rgi, pohewa 229, 270, 289. monp — map 121.711 57, 100, 260 ; forte 141. 


ras. nma 


107, 109, 152 — fup. raf. 139 — EWIT mm 76. own 107, 109, 139 — own 76. 
1, han 1° bis gg. aan 522 1°, 14. 523 3° forte Han 31. 
$ Soe — wend 147. "oy $6. "ow 19, 57,118,121, 

166, 693. 

4.770 213. TDN 314 17, 18, 19, 30, 31, 56, 57, 77; 82, 
89, 93,95. 99, 109, 110, 117, 118, 125, 129, 152, 153, 155, 158, 
164, 166, 167,170, 172, 173» 175, 176, 177, 187, 188, 196, 211, 
212, 215, 218, 224, 227, 235, 237, 239, 244, 249, 252, 253, 259, 
270, 384, 674, 680,693 ; primo 171; forte 94,128. Dow 
$7. MTOW 1,2, 4s 14, 30, $0, 57, 67» 77» 83, 93» 95, 99. 109, 
110, 117, 118, 125, 128, 129, 136, 139, 144, 152, 163, 164, 166, 
172, 173, 175, 181, 187, 196, 211, 212, 213, 214, 224, 226, 227, 
228, 236, 237, 244, 245, 252, 255, 270, 680, 693. 


2. n 1? bis 99. 
3. ono — WW) 147. 
693. 

4. " 213. ja 
82, 89, 93, 95, 99, 
155, 158, 164, 166, 
188, 196, 211, 212, 
249, 252, 253, 259, 
forte 94, 128. oy? 
77, 83, 93, 95, 99, 
139, 144, 152, 153, 
211, 212, 213, 214, 
252, 253, 270, 680, 


152 - sup. 


Jan 1° 


my 56. Any 


3, 14, 17, 
109, 110, 117, 
167, 170, 172, 
213, 218, 224, 
270, 384, 674, 
57. my 1, 2, 
109, 110, 117, 
164, 166, 172, 
224, 226, 227, 
693. 


18, 


14.920 3° forte en 
w 19, 57, 118, 121, 


31. 
166, 
19, 30, 31, 56, 
118, 125, 129, 152, 
173, 175, 176, 177, 187 
227, 235, 237, 239, 244, 
680, 693 ; primo 171 ; 
4, 14, 30, 50, 57, 67, 
118, 125, 128, 129, 136, 
173, 175, 181, 187, 196, 
228, 236, 237, 244, 245, 


57, 77, 


153, 


(a) Fragment of K's apparatus in .pdf (b) Fragment of K's apparatus in .txt 


Figure 1: Conversion from .pdf to . txt (example from o 1:1-4). 


2.3 The Context-Free Grammar 


Before building the parser with a view to an automatic encoding, we opted to encode 
Q's apparatus manually, for two main reasons: (1) to study how the various textual 
phenomena are represented in K and how these could be properly expressed in a 
TEI-compliant encoding; and (2) to have a benchmark against which to compare 
the automated encoding and by which to verify the accuracy of the parsing opera- 
tion (8 2.6). 

Then, we wrote a CFG by wielding the tools available in ANTLRA, a software 
designed to generate parsers from crGs.P A cra is a grammar consisting of a set 
of rules that permits one to analyze a formal language. The analysis is carried out 
by means of two kinds of rules: lexer rules and parser rules. The former permit 
the isolation of tokens from the textual flow. The latter describe the syntax, that 
is, how the different tokens distribute and combine in the apparatus. In ANTLR4, 


13 Accessed February 26, 2021, https://www.antlr.org/. See also T. Parr, Language Implementation 
Pattems: Create Your Own Domain-Specific and General Programming Languages (Dallas, TX and 
Raleigh, NC: Pragmatic Bookshelf, 2010) and T. Parr The Definitive ANTLR 4 Reference (Dallas, TX 
and Raleigh, NC: Pragmatic Bookshelf, 2012). 
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the first task is performed by the software component named lexer, the second by 


the parser. 


An example of cra is shown in Listing 1.!^ 


Listing 1 Context-free grammar. 


1 grammar kennicottCFG; 

2 all: listAppt; 

3 listApp: loc appt; 

4 app: lem? rdgGrp+ closeApp; 

5 lem: w+ (lemSeplocc); 

6 rdgGrp: rdg+ rdgGrpSep?; 

7 rdg: term? (w+)? term? wits 
rdgSep?; 

8 w: HEBW; 

9 loc: verse closeLoc; 

10 term: MAN DESC; 

11 wits witt; 

12 sigl: NUM; 

13 wit: sigl com?; 


14  lemSep: VAR SEP; 

15  rdgGrpSep: VAR SEP; 

16 com: COMMA; 

17 rdgSep: COMMA|SEMICOLON; 
NUMEROSIGN; 


num; 


18  numSign: 
19 verse: 
20 num: NUM; 


21 occ: num numSign; 


2 
23 
24 


25 


26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 


closeMainApp: NEWLINE; 


closeApp: END (TAB|NEWLINE); 
closeLoc: END; 

MAN DESC: 'Nu2038' | 'forte' | 
'sup. ras.' | 'lit. majorib.' | 
'vox major, et ornata ;' | 'non 
major ;' | 'primo' | 'forte' | 
"Bist; 

NUM: [0-9]+; 

HEBW: [\u0590-\u05ff]+; 
NUMEROSIGN: '°'; 

VAR SEP: '—'; 

END: Futz 

TAB: TAR 

COMMA: ','; 

SEMICOLON: ';'; 

NEWLINE: '\r\n'; 

WSs E fo skip: 

LTR: 'Nu200E' -> skip; 

RTL: 'Nu200F' -> skip; 


In the grammar, the parser rules are listed at the beginning, in lowercase; the 
lexer rules are found at the end, in uppercase. In the lexer rules we defined the 
tokens, which represent the minimal meaningful units of the apparatus: here we 
encoded, for example, the Hebrew Unicode characters (HEBW) necessary to detect 
the Hebrew words, the list of Latin annotations (MAN DESC), the class of numerals 
(NUM) by which the witness sigla are expressed, and a series of separators (such 
as VAR. SEP, END, TAB), which are important because they segment the critical 
apparatus into its minimum components. The elements we want to ignore, such 
as white spaces and other special characters, are skipped (-> skip). As can be 


14 The grammar we present here is a simplified version of the one used to parse the entire appa- 
ratus of Q. It can be used to parse the example we have shown, verses 1-4. The full version of the 


grammar can be found on Github, see note 7. 
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seen, the lexer rules are defined by means of regular expressions (e.g. ‘[0-9]+’ in 
NUM) or literal values (e.g. the list of technical terms and symbols in MAN DESC). 

Once defined, the lexer rules are passed to the parser, which checks the syntax 
through the parser rules. Let us look briefly now at how the parser works. In the 
example, it begins from the root-rule (a11), which specifies that the language we 
want to analyze consists of one or more lists of apparatus entries (ListAppt). 
Then it moves to listApp, which states that a list contains a location (loc) and 
at least one apparatus entry (app+). For each of these rules, the parser looks for 
child rules, all the way down to the lexer rules that contain the input data. In the 
case of the rule loc, for example, the parser first matches verse and closeLoc, 
which state that a verse consists of numbers and a final dot. Finally, it reaches 
NUM and END, in which the tokens in question are listed. This approach - called 
top-down since it proceeds from general to specific statements - is followed for 
the other rules of the crc, until all the elements of the apparatus are properly 
recognized and assigned. 

Running the crc on the plain text of Figure 1b produces a parse tree, fragments 
of which are shown in Figure 2. 

As can be seen, each apparatus component is attached to a label (technically 
corresponding to the name of the parser rule of the crG), which acts as a mne- 
monic for the user, of the function it plays in the apparatus. The tokens are found 
at the bottom of the tree, represented as tree leaves. 


2.4 General Exporter 


The third step consisted in the creation of an XML general exporter, in order to 
produce XML code containing the results of the parser analysis. 

We exploited a tree-walking mechanism made available in ANTLR4, named 
visitor. In very general terms, it can be said that the visitor traverses the tree's 
nodes and transforms them into XML tags. As can be seen from the code shown 
on the left of Listing 2, the tag names are taken from the parser rules, which in the 
parse trees of Figure 2 are represented as internal nodes (8 2.3). 

Although many of the tag names are borrowed from TEI terminology, and in 
particular from the vocabulary of Module 12 devoted to the encoding of critical 
apparatuses, the XML code generated by the general exporter is not TEI, but is 


15 TEI Consortium, eds., “12 Critical Apparatus." TEI P5: Guidelines for Electronic Text Encoding 
andInterchange, Version 4.2.0, last updated February 25, 2021, TEI Consortium, accessed February 
26, 2021, https://tei-c.org/release/doc/tei-p5-doc/en/html/TC.html. 
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(b) Q 1:3 


Figure 2: Examples of parse trees. 


a proprietary language. In fact, it consists of a series of XML elements containing 
all the data of the original apparatus, including para-textual elements such as 
tabs and newlines. To generate a TEI critical apparatus, an XSL transformation is 
needed, as we describe below. 


2.5 From XML to TEI 


The fourth and last step consisted in designing an XSL-T stylesheet for converting 
the xML code into a compliant XML-TEI encoding (Listing 2, on the right). 


Listing 2 Conversion from XML to XML-TEI encoding (example from Q 1:1). 


1 <listapp> 29 <listapp> 

2 <loc> 30 <app loc="1 1"> 
3 <verse> 31 «lem» 

4 <num>1</num> 32 <w> /w> 

5 </verse> 33 </lem> 
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6 <closeLoc>.</closeLoc> 34 <rdgGrp> 

7 </loc> 35 <rdg wit="#K136 #K139"> 
8 <app> 36 <term>vox major et ornata< / term» 
9 «rdgGrp» 37 «/rdg» 

10 «rdg» 38 «/rdgGrp» 

1 «w»"mMkE/w» 39 <rdgGrp> 

12 «w»nbonp«/w» 40 «rdg wit="#K1 #K2 #K3..."> 
13 <term>lit. majorib.</term> 41 <term>non major</term> 
14 <wit> 42 «/rdg» 

15 <sigl>4</sigl> 43 </rdgGrp> 

16 <com>,</com> 44 </app> 

17 </wit> 45 <app loc="1 1"> 

18 <wit> 46 <lem> 

19 «sigl»109«/sigl» 47 «w»"Mc/w» 

20 </wit> 48 «w»nbnp«/w» 

21 </rdg> 49 </lem> 

22 «/rdgGrp» 50 <rdgGrp> 

23 </app> 51 <rdg wit="#K4 #K109"> 
24 <app> 52 <term>lit. majorib.</term> 
25 <lem> 53 </rdg> 

26 «w»nonp«/w» 54 </rdgGrp> 

27 «/lem» 55 </app> 

28 </listapp> 56 </listApp> 


As can be seen, the file contains lists (<1istapp>) of apparatus entries («app»), 
consisting in turn of (lists of) reading groups (<rdgGrp>). The witnesses are 
encoded in the attribute @wit of the element «rdg», while information about 
the status of the variant readings (e.g. the Latin annotations of the original appa- 
ratus) are encoded in the element «term». 

The stylesheet is customizable: here we decided to eliminate elements found 
in the printed source which are philologically irrelevant, such as separators and 
other typographical symbols. 

The method chosen to link the apparatus to the text is the location-referenced 
method, which is the one usually recommended for digitizing printed critical 
editions.’° 


16 TEI Consortium, eds., “12.2.1 The Location-Referenced Method.” TEI P5: Guidelines for Electron- 
ic Text Encoding and Interchange, Version 4.2.0, last updated February 25, 2021, TEI Consortium, 
accessed February 26, 2021, https://tei-c.org/release/doc/tei-p5-doc/en/html/TC.html. 
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2.6 Evaluation 


After we produced our XML-TEI code through xsL transformation, we compared 
the automatically encoded file with the one we had previously encoded by hand 
(8 2.3), in order to assess the effectiveness of both parser analysis and XSL trans- 
formation. 

The results showed that the parser as well as the xsLT stylesheet performed 
very well indeed: the apparatus was correctly parsed pursuant to the rules estab- 
lished in the crc, and the approximately 2,600 variants" of a have been correctly 
represented in a TEI-compliant encoding. 

Such results should not, in fact, surprise us: the CFG was specifically designed 
to embrace all the textual phenomena found in the apparatus of Q, so that maximum 
accuracy in this case was rather expected. 

We decided, therefore, to digitize the collation of the other Megillot,'? and 
to subject it to the same procedures described above. The goal was to test the 
robustness of the parser, that is, to verify whether the rules we set up in the CFG 
were valid only for the apparatus of the book of Q, in which case our model would 
have been overfitting and hence scarcely applicable, or whether they could be 
generalized, and hence used to digitize the other books of k’s collation. 

This time, several mismatches (syntactic errors) were reported by the parser 
during the analysis of the new data, meaning that it was not able to classify all the 
variants contained in the apparatus of these other biblical books. 

Upon closer examination, however, such errors turned out to be few - less 
than a dozen, out of a total of about 4,000 variant readings - and of little impor- 
tance: most are due to the presence of technical terms not listed in the crc and 
of textual phenomena that do not occur in the book of Q. We therefore modified 
the grammar rules in order to encompass these other phenomena, and we ran the 
parser again with this latest version of the CFG. At this second running, the parser 
succeeded in analyzing the apparatus without syntactic errors. 

As a final step, we selected several passages in the apparatus where more 
complex textual phenomena occur, such as double or marginal readings, and 
we analyzed the output produced by the parser in order to detect the presence 
of incorrect classifications (semantic errors). Although it cannot be completely 
ruled out that some entries were actually misinterpreted by the parser, the sample 
survey we carried out seems for the moment to exclude this possibility. 


17 The estimate refers to the number of <rdgGrp> elements. 

18 These are: Song of Songs (Kennicott, Vetus Testamentum Hebraicum, 2:525-33), Ruth (2:534-39), 
Lamentations (2:540- 48), and Esther (2:562-72). 

19 Both . txt and . xml files of these books can be found on our GitHub repository, see note 7. 
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In conclusion, we feel confident in affirming the strong applicability of a rule- 
based parsing system to k’s collation, due in particular to the formal language 
of his apparatus. The crc we designed, though initially developed in relation to 
the variant readings of a single biblical book, proved to be such a flexible and 
easy-to-implement tool that other books of k’s collation could be automatically 
encoded as well, using the identical methodology. 


3 Computer-Assisted Stemmatic Analysis 


Encoding converts textual data into machine-actionable format. This makes it 
possible to query the critical apparatus and to extract information from it through 
different kinds of techniques. 

In textual criticism, one possible use of encoding of critical apparatuses is stem- 
matic analysis.” Since in an encoded critical apparatus the witnesses are aligned 
according to the readings they share, this information can be exploited in order to 
compute genealogical relationships between witnesses and to establish stemmata 
codicum following a (neo-)Lachmannian, computer-assisted methodology.” 

In the domain of HB studies, the genealogical approach, of which Lach- 
mann's method is one of the best known instances, is quite exceptional? and 
has never been applied, to the best of our knowledge, to medieval manuscripts. 
In contrast with New Testament studies, in fact, there have been few attempts to 
study the medieval Hebrew tradition using stemmatological criteria. The studies 


20 The basics of the stemmatic method are outlined in the classical work of P. Maas, “Textkri- 
tik," in Einleitung in die Altertumwissenschaft, 2nd ed., vol. 1.2, ed. A. Gercke and E. Norden (Leip- 
zig: Teubner, 1927), 1-18 and P. Maas, Textkritik, 4th ed. (Leipzig: Teubner, 1960). For a recent 
introduction to stemmatology see P. Roelli, ed., Handbook of Stemmatology (Berlin and Boston: 
De Gruyter, 2020). 

21 The bibliography on computer-assisted stemmatology is extensive. Some of the classical 
works are PT. van Reenen, M. van Mulken, and J. Dyk (1996). Studies in Stemmatology, vol. 1 
(Amsterdam and Philadelphia: John Benjamins, 1996); B.J.P. Salemans, “Building Stemmas with 
the Computer in a Cladistic, Neo-Lachmannian, Way: The Case of Fourteen Text Versions of 
Lanseloet Van Denemerken" (Ph.D. thesis, Katholieke Universiteit Nijmegen, 2000); and PT. van 
Reenen and M. van Mulken, Studies in Stemmatology, vol. 2 (Amsterdam and Philadelphia: John 
Benjamins, 2004). 

22 See B. Chiesa, “Textual History and Textual Criticism of the Hebrew Old Testament," in The 
Madrid Qumran Congress, ed. J.C. Trebolle Barrera (Leiden: Brill, 1992), 257-72; B. Chiesa, Filo- 
logia storica della Bibbia Ebraica, vol. 2 of Studi Biblici 135 (Brescia: Paideia, 2000), 399 ff.; and 
Tov, Textual Criticism of the Hebrew Bible, 359 ff. 
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focused on stemmatological issues have been based on non-genealogical statisti- 
cal methods,? and most of them have questioned even the possibility of applying 
the genealogical model to the transmission of the HB text in the Middle Ages.?^ 

Here, we propose to investigate precisely that possibility, by exploiting phy- 
logenetic algorithms from bioinformatics.” The method we employed is basically 
quantitative and was performed with the aid of the computer by way ofthe encod- 
ing. It consisted of four phases: (1) selection of variants considered to be genea- 
logically significant (regularization, 8 3.1); (2) survey of some of the manuscripts 
in K's collation as well as of others not included therein (8 3.2); (3) conversion of 
philological data into a numerical format suitable to computational treatment 
(83.3); and (4) stemmatic analysis through phylogenetic methods (8 3.4). 


3.1 Regularization 


Since our task here is to recover genealogical relationships among the textual 
witnesses, it is necessary to exclude from the analysis all phenomena of textual 
change which are more likely due to coincident variation (polygenetic variants). 
To accomplish this, we have elaborated an annotation scheme which enables us 
to classify the variants on the basis of typological criteria, so as to select only the 
most significant ones from a genealogical point of view. In this way, the variants 
marked as irrelevant are skipped when generating the data matrix required by the 
phylogenetic software ($8 3.3, 3.4). 

An example of annotated code is shown in Listing 3. As can be seen, the 
annotation is carried out by means of tags recorded in the attribute @ana (short 


23 Such as the clustering algorithm employed in P. Sacchi, “Analisi quantitativa della tradi- 
zione medievale del testo ebraico della Bibbia secondo le collazioni del De Rossi," Oriens 
Antiquus 12 (1973): 1-13 and P.G. Borbone, Il libro del profeta Osea. Edizione critica del testo 
ebraico (Torino: Zamorani, 1990), 183-227. 

24 For a survey of such studies see D. Barthélemy, “Les manuscrits médiévaux et le texte tibérien 
classique," in Critique textuelle de l'Ancien Testament, 3. Ézéchiel, Daniel et les 12 Prophétes, vol. 3 
of Orbis Biblicus et Orientalis 50 (Fribourg and Góttingen: Éditions Universitaires and Vanden- 
hoeck & Ruprecht, 1992), xix-xxvii. 

25 A more complete discussion of this analysis can be found in L. Bambaci, “Is a Stemma Pos- 
sible for the Hebrew Bible? Towards a Genealogy of Medieval Manuscripts through Phylogenetic 
Analysis," Materia Giudaica. Rivista dell'associazione italiana per lo studio del giudaismo 26, no. 
2 (2021): 3-30. 
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for ‘analysis’)*® of the element «rdgGrp». The tags are defined by the user" and 
can be divided into two main classes, corresponding to our distinction between 
accidental and substantial variants. Among the former we include, for example, 
variants of spelling, such as those of scriptio plena and defectiva (#scr_pl, 


#scr_def). 


Listing 3 Example of annotation (Q 1:1). 


1 <listapp> 27 «/app» 

2 «app loc="1 1"> 28 «app loc="1 1"> 

3 <lem> 29 <lem> 

4 <w>aT</w> 30 «w»nonp«/w» 

5 </lem> 31 </lem> 

6 <rdgGrp ana="#typ"> 32 <rdgGrp ana="#sub #sem #nm"> 
7 <rdg wit="#K136 #K139"> 33 <rdg wit="#K121"> 

8 <term>vox major etornata</term> 34 «w»nnp«/w» 

9 «/rdg» 35 «/rdg» 

10 «/rdgGrp» 36  «/rdgGrp» 

11 «rdgGrp ana="#typ"> 37 «/app» 

12 «rdg wit="#K1 #K2 #K3.."> 38 «app loc="1 1"> 

13 «term»non major</term> 39  «rdgGrp ana="#scr pl #nm"> 
14 «/rdg» 40 «rdg wit="#K57 #K100 #K260"> 
15 «/rdgGrp» 4 <w>TIT</w> 

16  «/app» 42 «/rdg» 

17 «app loc-"1 1"> 43 «rdg wit="#K141"> 

18 «lem» 44 «term»forte« / term» 

19 «w»"Me/w» 45 «/rdg» 

20 «w»nonp«/w» 46 </rdgGrp> 

21 «/lem» er 

22 <rdgGrp ana="#typ"> 47 </app> 

23 <rdg wit="#K4 #K109"> & 

24 <term>lit. majorib.«/ term» 48 </listapp> 

25 «/rdg» 

26 «/rdgGrp» 


26 The attribute Gana is used to express analyses or interpretations on parts of text, see: ac- 
cessed February 26, 2021, https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.global.analyt- 
ic.html. 

27 In TEI these have been defined in a list of <interp> elements, which are used for marking 
interpretative annotations, see: accessed February 26, 2021, https://tei-c.org/release/doc/tei-p5- 
doc/en/html/ref-interp.html. 
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Among the latter we include phenomena of addition (#add), deletion (#del), 
substitution (#sub), and transposition (#tran) of word(s). In the fragment of 
code shown in Listing 3, for example, the first three variant readings have been 
classified as typographical (4t typ), since they refer to details of mise en page of the 
text; the last two, on the other hand, have been annotated, respectively, as cases of 
substitution and scriptio plena of nouns (#sub #sem finm, scr. pl #nm). 

The basic assumption of the proposed distinction is that accidentals are more 
easily prone to random variation than substantials.*® As a consequence, they may 
represent a problem for quantitative analysis, when we consider that they are the 
most frequent in the textual tradition.” Substantial variants, on the contrary, are 
assumed to be safer as indicators of genealogical kinship, and can offer a more 
solid basis for inferring descent patterns.?? 

Besides accidentals, we excluded two other categories of variants: sec- 
ond-hand variants (nunc), which may derive from external sources such as other 
codices or massoretic lists, and hence be the product of contamination; and 
variants attested in one witness only (lectiones singulares), which are useless in 
establishing relationships. Dubious variants (videtur, forte) and readings super- 
imposed over erasures (sup. ras.) are excluded as well. 


28 This applies in particular to variants related to the use of matres lectionis. Other accidentals, 
such as some kinds of graphic variants (graph), can be considered genealogically significant. 
An example is the alternation between ni»3v/m^2o found in different passages in Q and perhaps 
attested in some of the ancient versions (see Q 1:17). This and other instances can be included in 
the analysis either by modifying the annotation (for example, from #graph to #sub, when the 
variant involves a possible change in meaning, as in the example just mentioned) or by applying 
specific tags which prevent the variant from being filtered during the xsL-T transformation phase 
(8 3.3). In our annotation, the tag signif (short for ‘[genealogically] significant [variant]’) is 
devoted to this task. 

29 Sixty-three percent ofthe variants of Q are accidental according to our findings. Of these, 96% 
are variants of scriptio plena and defectiva. 

30 Of course, both the decision to filter accidentals, as well as to apply particular annotations 
to carry out the regularization, are, of necessity, the result of subjective interpretations about the 
logic ofthe copying process and the nature of textual change, and are as such questionable. The 
annotation scheme, on the other hand, has the advantage of making such interpretations explic- 
it, thus allowing for the possibility of an intersubjective control. Moreover, this scheme allows 
for a customizing of the selection of kinship-revealing variants and permits us to automate the 
regularization pursuant to the criteria chosen. 
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3.2 Witnesses 


The reliability of k’s collation has occasionally been questioned by scholars?! The 
criticisms concern in particular the lack of distinction between first and second 
hand,” and more generally errors and inaccuracies in reporting variant readings 
in the critical apparatus. Clearly, if criticisms of this nature were to prove true to 
such an extent that most variants resulted as either missing or spurious in the 
apparatus of Q, then the analysis would be completely unreliable. For this reason, 
we decided to re-examine a sample of 59 manuscripts? classified in K as fully col- 
lated, so as to get at least a rough estimate of the accuracy of the collation as a 
whole.” Our examination proved quite satisfactory (86% accuracy) and seems to 
suggest that K's apparatus can be considered a reliable source, at least as far as 
substantial variants of Q are concerned.” 

Once we corrected the errors and integrated the missing data, we went on to 
collate six very ancient Oriental manuscripts not found in K at all. All fragmentary 
manuscripts as well as manuscripts classified as partially collated were excluded.?® 


3.3 Data Matrix 


After refining the data and selecting the most relevant witnesses, we wrote an XSL 
stylesheet in order to convert the variants into numerical values, and inserted them 
into a data matrix suitable for computational analysis (Figure 3). 


31 See Barthélemy, Critique textuelle de l'ancien Testament, 1. Josué-Esther, 34 ff.; Chiesa, Filolo- 
gia storica della Bibbia Ebraica, 140 ff.; R.S. Hendel, The Text of Genesis 1-11: Textual Studies and 
Critical Edition (New York and Oxford: Oxford University Press, 1998), 109 ff. 

32 A fact already known to DR, see De Rossi, Variae lectiones Veteris Testamenti, 1:xlvi-xvlii. See 
also Barthélemy, *Les manuscrits médiévaux et le texte tibérien classique," xxvv ff. 

33 No particular criterion was followed for sampling the manuscripts other than that of availa- 
bility in digital format. 

34 The accuracy is calculated as the ratio of the variants recorded by K to the total number of 
variants present in the manuscripts according to our surveys. The estimate refers to substantial 
variants only. 

35 Most inaccuracies concern the fact that K does not adequately report whether a variant has 
been later corrected according to the received text. However, since here we take into considera- 
tion the variants of the first hand and disregard corrections, these inaccuracies are insignificant 
for stemmatic analysis. Cases in which substantial variants are missing or erroneous are rarer 
and have been corrected. 

36 The list of witnesses divided by collation degree is in Kennicott, Vetus Testamentum Hebra- 
icum, 2:572. 
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1 2 3 4 5 6 7 8 
Taxon 12345678901234567890123456789012345678901234567890123456789012345678901234567890... 
H 060000000000000000000000000000000000000000000000000000000000000000000000000000000 
BOL 060000000000000000000000000000000000000000000000000000000000000000000000000000000 
3IAK1 0€0000000000000010001000000000000000010000000001000000000000000000001000000000010 
3SK2 6€00010600000100000010000000000000000000000000000000000000000000001000000000000000 
4SK3 0e0000000000000000000000000100000000000000000000000000000000000000000000000000000 
2AK4 00000101000000000000001000000000000100000000000000000000000000001000001000000000 
4SK14 0€0100000000000000010000000000000000000000000000001000000000000000000001000000000 
3AK17 00000100000100000001001000000000000101000000000000000000000001000000001000000000 
3AK18 0€1000010000100001001101000000001010100000100000000000000000000001100001100000000 
4SK19 60000000001000000000000000000000000100000000000000000000000000000000000000010000 
2AK36 00000010001000000100001000000000000000000000000010000002000000000000001000000000 
80K31 e0000000000000000001000000000000000000000000000000000000000000000000000000000000 
3AK5® 0e00016e00000000000000000000000000000000000000000000000000000000000000000000000000 
3AK56 090000000000000000010000001000000000110100000000000000000001000000000010000000000 
4SK67 0e0000600000000000000000000000000000000000000000000000000000000000000000000000000 
2AK76 €0000001000000000000001001000000001001100000000000000000000000000000000010000000 


Figure 3: Fragment of data matrix. 


In the matrix the witnesses are represented in rows, while places of variation 
are represented in columns as a vector of characters. Characters contain differ- 
ent values, called character states, which represent different readings: ‘0’ if the 
witness does not show any variant (that is, if it reads as the reference text), ‘1’ if it 
has the reading of the first group (the element <rdgGrp> used in the encoding), 
‘2’ of the second group, and so forth. In total, the matrix consists of 116 witnesses 
and 371 places of variation. 


3.4 Phylogenetic Analysis Through Maximum Parsimony 


In order to compute genealogical relationships we employed phylogenetic methods 
proper to evolutionary biology.” Such methods permit the separation of witnesses 
into discrete groups according to inherited variants, and the representation of their 
relationships by the form of tree-like graphs, named phylogenetic trees. 

Three main components can be distinguished in a phylogenetic tree: (1) the ter- 
minal nodes or leaves, representing the extant witnesses - in phylogenetic terms, 
the taxa; (2) the internal nodes or internodes, representing hypothetical witnesses 


37 We followed the methodology illustrated in A.-C. Lantin, P. Baret, and C. Macé, “Phylogenetic 
Analysis of Gregory of Nazianzus' Homily 27," in Le poids des mots. Actes des 7émes Journées In- 
ternationales d'Analyse statistique des Donées Textuelles, vol. 2 (Louvain-la-Neuve: JADT, 2004), 
700—707. The appendix to the article shows the procedures applied to operate the phylogenetic 
software PAUP (see note 39), which we also followed. 


318 — Luigi Bambaci 


from which the extant witnesses are presumed to descend; and (3) the tree branches 
or edges, which identify genealogical relationships. 

When the information contained in the matrix is not conflicting, a phyloge- 
netic tree will appear as fully dichotomous (or bifurcating): two witnesses are 
connected to a common ancestor, which is in turn connected to the common 
ancestor of other witnesses, all the way to the tree root. The similarity of phy- 
logenetic trees to traditional stemmata codicum, however, must not lead us to 
jump to the conclusion that the first manuscript contains all the readings of the 
second, the second all the readings of the third, the third of the fourth, and so 
on, i.e., we should not automatically, as it were, read a phylogenetic tree like 
a proper stemma. Phylogenetic reconstruction of a family of manuscripts does 
not necessarily depend on the majority readings that are found in that given 
family.?? Instead, when in a phylogenetic tree we find a group of witnesses clus- 
tered around one or more ancestors, it means that that group inherited from its 
respective ancestors the readings that require the least number of changes over 
the whole tree, as we are about to demonstrate. 

Different tree-building methods exist in bioinformatics. We chose to use the 
Maximum Parsimony (MP) as implemented in PAUP, a widely-used software for 
phylogenetic analysis.” 

According to MP, if two or more taxa share the same traits (the character 
states in the matrix of Figure 3), then it is far more likely that they were inher- 
ited from a common ancestor, rather than arising independently, each time in a 
different taxon. This last assumption, which goes in biology under the general 
name of homoplasy, is less parsimonious because it implies multiple changes at a 
time. MP tries to limit the amount of homoplasy as much as possible, by searching 
for the tree that minimizes the number of changes required to explain the actual 
character distribution.^? 

Given a character matrix, the basic implementation of MP must address two 
sub-problems: (1) generating the list of all possible trees; and (2) calculating the 
number of changes for each tree, or tree length. 


38 See M. Spencer, C.J. Howe, and K. Wachtel, “The Greek Vorlage of the Syra Harclensis: A Com- 
parative Study on Method in Exploring Textual Genealogy," TC: A Journal of Biblical Textual Crit- 
icism 7 (2002), http://rosetta.reltech.org/TC/v07/SWH2002/index.html. 

39 Accessed February 26, 2021, https://paup.phylosolutions.com/. 

40 A general overview on MP analysis can be found in D.A. Baum and S.D. Smith, Tree Think- 
ing: An Introduction to Phylogenetic Biology (New York: Macmillan Learning, 2013), 173-207. For 
a more detailed description with some PAUP running code see D.L. Swofford and J. Sullivan, 
“Phylogeny Inference Based on Parsimony and Other Methods Using PAUP,” in The Phylogenetic 
Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing, ed. P. Lemey, M. 
Salemi, and A.-M. Vandamme (Cambridge: Cambridge University Press, 2009), 267-312. 
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As far as the first task is concerned, there are two main searching methods: 
exhaustive and heuristic. The first method generates trees by exploring all pos- 
sible combinations of taxa; the second starts by selecting an initial tree and then 
modifies it by using various strategies to minimize its length. However, since the 
number of trees rapidly increases with the number of taxa, as in our case, heuristic 
methods are more often employed, which, although they do not absolutely guaran- 
tee the discovery of the optimal tree, they have proven to yield good approximations 
of the most parsimonious ones. For our case study, the tree bisection-reconnection 
method (TBR) is used, which is set by default in PAUP.“ 

Turning now to the question of tree length, this is computed, in essence, by 
counting how many times a character state changes when going from one node to 
another. This calculation is carried out for every node on every single character in the 
matrix. At the end, the number of total changes (or steps) required by all the char- 
acters is summed up and this value is considered to be the length of that given tree. 
The tree that has the minimum length is then selected as the most parsimonious. 

An example may help clarify the steps involved in the building of a phyloge- 
netic tree through Mp. Suppose we have a textual tradition with four witnesses 
ABCD and three places of variation, each with two competing readings 01, as 
shown in Table 1. 


Table 1: Example of character matrix. 


taxon char.1 char.2 char.3 


A 1 


Ojo -r-|ı» 
[22 M oje 


B 1 
C 0 
D 0 


With four witnesses, three combinations are possible: (1) AB/CD; (2) AC/BD; and 
(3) ADJBC. For each of these trees, the number of changes of every single charac- 
ter in the matrix is calculated as follows: first, the tree is rooted at an arbitrary 
point (in PAUP the rooting is done on the first taxon of the matrix); then, if two 
witnesses have a reading in common in a given place of variation, this is assigned 
to the common ancestor; otherwise, if the witnesses have two different readings, 
both are assigned to the ancestor and the tree length is increased by one. 


41 An explanation of the TBR algorithm can be found in the PAUP manual, see note 39. 
42 This is the basic description of Fitch's algorithm, which presupposes that character states 
are unordered and that the cost of each character step is always equal to one. A more detailed 
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An illustration of the procedure carried out on the first three is shown in 
Figure 4. 


character 1 character 2 character 3 
{1} {1} {0} te {1} {0} {1} {0} a) a) {0} {0} 
A B ©: D A B Ç D A B Cc D 
{1} (0) (1,0) (1,0) "d (e) 
(1,0) (1,0) (1,0) 
0-041221 1+1+0=2 0+0+1=1 


Figure 4: Character steps for tree no. 1. 


As we can see, characters one and three change once, when going from the ances- 
tors to the root. Character two changes twice, which means that it fits less into the 
tree than the other two. The total steps required by all the characters on tree one 
is therefore equal to four. If we carry out the analysis on the other trees, we obtain 
the values shown in Table 2. 


Table 2: Length of each character state for each possible tree. 


tree char. 1 char. 2 char. 3 length 
1 1 2 1 4 
2 2 1 2 5 


3 2 2 2 6 


The first tree is the most parsimonious, because it accounts better for the overall 
character distribution. 

Itis worth emphasizing, however, that the best tree is not optimal for all the 
characters: as can be seen in Table 2, character two fits better with the second 
tree, in which it changes just once. The first tree is better than the others, but it 
still contains homoplasy. 


description of this and other algorithms available in PAUP can be found in Swofford and Sullivan, 
“Phylogeny Inference," 270-77. 
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Furthermore, different reconstructions for character two are possible on the 
first tree. This is illustrated in Figure 5, where the results of the ancestral recon- 
struction (also known as character-state reconstruction or character optimiza- 
tion) for the best tree are shown. 


ACCTRAN DELTRAN 


(1,1,1) {1,1,1} 


{1,0,1} CLIA 


1 
3 x 


{0,0,0} ps 


C D B A C D B A 
{@,1,0} {0,0,0} {1,0,1} {1,1,1} {0,1,0} {0,0,0} {1,0,1} {1,1,1} 


Figure 5: Difference between ACCTRAN and DELTRAN optimization algorithms. 


In the first scenario, character two changes twice: the first time from 1 to O between 
the root and the ancestor 6; the second time from O back to 1 in C. In the second 
scenario, two parallel changes occur, from 1 to 0 in both B and D. These two differ- 
ent character reconstructions correspond to two different optimization algorithms: 
the first, called accelerated transformation (ACCTRAN), assumes that changes 
occur as soon as possible in the tree, and implies character reversal; the second, 
called delayed transformation (DELTRAN), assumes that changes occur as late as 
possible, and implies parallelism.? Here, we have preferred ACCTRAN, which is the 
default transformation in PAUP. 

And so, we have seen different ancestral reconstructions based on the same 
tree structure (or tree topology). In many cases, however, more than one parsi- 
monious tree is found post-analysis. For instance, imagine if we insert into the 
matrix shown in Table 1 a fourth place of variation with the same reading dis- 
tribution as the second: in that case, we will have two places containing con- 
flicting information. From this new matrix, two equally parsimonious trees with 
length 6 are generated: one with topology AB|CD and the other with topology 
AC|BD. In these cases a strict consensus tree can be computed, which will reveal 


43 Another algorithm is implemented in PAUP, called minimum-F value (MINF). See PAUP man- 
ual quoted at note 39. 
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the patterns common to all the equally parsimonious trees generated, as shown 
in Figure 6.^ 


C D B A 


Figure 6: Example of strict consensus tree with polytomies. 


As can be seen, the tree is not dichotomous, but presents polytomies, with more 
than two manuscripts descending from the same common ancestor. Such polyto- 
mies are usually interpreted as a sign of uncertainty in the data. 


3.5 Results 


The results of the analysis we carried out on k’s collation of Q are shown in the 
phylogenetic tree in Figure 9 at the end of this article. The tree was generated by 
a strict consensus of about 25,000 equally parsimonious trees, following the pro- 
cedures described in the previous section. 

Once we obtained the tree topology, we proceeded to examine the ances- 
tral character reconstruction carried out by PAUP, in order to identify how many 
variants were assigned to the ancestors of groups of manuscripts. In particular, 
we tried to trace the so-called characteristic variants, i.e. ancestral variants that 
appear only once in a given group of manuscripts, or variants that appear in 
only a few groups. A list of the most relevant characteristic variants is shown in 
Table 3 in the Appendix. The result is that 104 out of 371 places of variation have 
been identified as ancestral and assigned to common ancestors. This means that 
most variants are classified as autapomorphies, that is, as secondary innovations 
introduced by the scribes of individual manuscripts. About 20 of these variants 
can be considered characteristic, since they appear only once, or at most, very 
few times, in the tree. 


44 Other kinds of consensus tree are implemented in PAUP in addition to the strict consensus, 
such as the majority rule consensus and Adam consensus, see PAUP manual quoted at note 39. 
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The tree presents 14 groups in total. Most of these are dichotomous, but poly- 
tomies are also found (such as in the fifth group of manuscripts 2AK168, 4SK144, 
and 00K31).^ 

Eleven witnesses could not be classified. These witnesses, however, have very 
few variants from the reference text used by K (8 2.1), which represents the received 
text or textus receptus (TR): here we find four Sephardic and four ancient Oriental 
manuscripts, including the Leningradensis (OOL). It is not incorrect, therefore, to 
consider them as representative of the TR group. 

With regard to the other groups found by the machine, some are not certain, 
because they share a low number of variants or because the variants in common 
are weak in terms of their ability to reveal kinship. One such group is the fifth 
mentioned above, which is based on a single ancestral variant. Other groups, on 
the other hand, seem more likely, since either the quantity or the quality of inher- 
ited variants allows us to infer a genealogical kinship. 

Let us take a closer look at the ancestral reconstruction of some of these 
groups.“ In Figure 7 the group with three Ashkenazic manuscripts is shown (the 
sixth in the tree from the left). 


char. | 31AK1 | 3AK111 | 2AK185 


79 1 1 
102 
187 


1 

1 

E 312 1 
147 1 

1 

0 

1 


PRP PR 


rat 150 


197 
3IAK1 3AK111 2AK185 215 


POR R]P PBR 


Figure 7: Ancestral reconstruction for 3IAK1, 3AK111, and 2AK185. 


According to the ancestral reconstruction, these manuscripts inherited from their 
common ancestor (the internal node no. 118) a total of four readings. The man- 
uscripts 31AK1 and 3AK111 have four other readings in common, which are not 


45 Inthe witness sigla, K’s catalogue number is preceded by a number that identifies the century 
(© for eleventh century, ‘1’ for twelfth century etc.) and by an upper-case letter that identifies 
the script (‘A’ for Ashkenazic, ‘S’ for Sephardic, ‘I’ for Italian, ‘IA’ for Italian-Ashkenazic, and ‘O’ 
for Oriental). Printed editions are marked with ‘E’. This system is designed to make it easier to 
identify the witnesses and to read the information contained in the tree. 

46 A more detailed analysis of the ancestral reconstruction can be found in the article quoted 
at note 25. 
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found in 2AK185 and are assigned to another ancestor (internal node no. 117). The 
first reading (no. 79)^' can be considered characteristic of this group, since it is 
found here and in two other manuscripts, with many variants, belonging to other 
branches of the tree (2AK158 and 3UK218). 

As stated in Section 3.4, the best-fit algorithm underlying the ancestral recon- 
struction does not always assign to common ancestors the readings that are found 
in the majority of descendant witnesses. See for example Figure 8, in which the 
result of the ancestral reconstruction for three Italian manuscripts 1IK226, 2IK227, 
and 4IK225 is shown. 


char. | 11K226 | 21K227 | 4IK225 
6 1 1 1 
30 1 0 1 
45 1 0 1 
258 1 1 1 
269 1 1 0 
312 0 0 0 
362 1 T 0 
20 1 1 
124 1 1 
1IK225 2IK227 4IK226 320 0 0 


Figure 8: Ancestral reconstruction for 11K225, 21K227, and 41K226. 


Seven readings are assigned to the common ancestor 168. As can be seen, however, 
only three variants are actually common to all three manuscripts (nos. 6, 258, 312). 
The other readings are found in 1IK226 and 4IK225 (nos. 30, 45) or in 1IK226 and 
2IK227 (nos. 269, 362). Manuscripts 2IK227 and 4IK225 share three other readings 
not found in 1IK226 (nos. 20, 124, 320), which are assigned to node 167. Both the 
distribution of readings and the fact that readings 30 and 45 are characteristic for 
1IK226 and 4IK225 (see Table 3 in the Appendix) caused the three manuscripts to 
be lined up in this way by the algorithm. Given the common readings and script, a 
genealogical kinship is likely here and must be verified. 

MP analysis leads to the likely conclusion that other witnesses, too, are gene- 
alogically interrelated. 

As shown in Table 3, the largest number of characteristic variants are found in 
pairs of manuscripts of the fourteenth and last group. These manuscripts present 
ahigh number of variants with respect to the TR and seem to belong to that branch 


47 The variant is in Q 3:11 and regards the omission of the conjunction x». 
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of the textual tradition usually identified as non-receptus or anti-receptus.** The 
predominance of the Italian-Ashkenazic element in this group is remarkable. 


4 Discussion 


In Section 2 we described the creation of a parser for encoding k’s collation of the 
HB. The parser enabled us to encode thousands of variants fully automatically, 
thus allowing us to avoid the complex task of manually encoding the critical appa- 
ratus. The results prove that a rule-based parsing system works well and suggest 
that such a system can be extended to include all the other sections of k’s work. 

As we mentioned in Section 2.2, the parser has proved to be a powerful tool in 
surmounting the hurdles presented by the first and perhaps most delicate chal- 
lenge we confronted it with, namely, digitization through ocr software. 

Digitizing critical apparatuses is challenging because of the specialized lan- 
guage that characterizes them. Errors generated by OCR are difficult for the human 
eye to detect, and manual correction is extremely time-consuming. Thanks to the 
CFG, a number of common errors, such as substitutions, omissions, and misdivi- 
sions of the apparatus, can be easily detected owing to the lexical and syntactic 
analysis provided by the lexer and the parser. This ensures a tight control over the 
structure of the critical apparatus, which allows for a correct identification of the 
function of its various components. The crc also makes it possible to spell-check 
literal values, such as separators, technical terms, and witness sigla. ^? 

The most important limitation of this system, as can be imagined, is repre- 
sented by the impossibility of linguistically checking the Hebrew readings. In 
fact, Hebrew readings cannot be ratified by the CFG: as shown in the fragment 
of cr in Listing 1, Hebrew tokens are described as simple sequences of Unicode 
alphabetic characters (* [Nu0590-Nu05££] +°), which means that it is not pos- 
sible to spellcheck with the lexer whether they are ‘legal’ Hebrew words or not. 


48 See Sacchi, "Analisi quantitativa," 8 ff.; M.H. Goshen-Gottstein, “The Rise of the Tiberian 
Bible Text,” in Biblical and Other Studies, ed. A. Altmann (Cambridge, MA: Harvard University 
Press, 1963), 108 ff. 

49 In the sample crG shown in Listing 1 and in the final version available on Github (see note 7), 
the witness sigla are encoded as simple alpha-numeric characters. However, it is possible to cre- 
ate a rule in which the witness sigla used by K are encoded as literal values: in this way, the 
lexer will throw an error if a siglum does not match the corresponding value in the list of witness- 
es. For the Megillot, the list of witnesses can be acquired from Kennicott, Vetus Testamentum 
Hebraicum, 2:572. 
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At this stage, this task entirely depends on the language model built into OCR soft- 
ware, and the results must all be verified by direct human oversight. The imple- 
mentation of a spell-checker in post-processing would be required if we want to 
improve the accuracy of the transcription up to this level. 

In Section 3, we illustrated how data from encoded critical apparatuses have 
been used in computer-assisted stemmatic analysis. The task of classifying Hebrew 
biblical manuscripts is enormous and poses a number of problems to the investi- 
gator. 

The first such problem concerns the paucity of genealogically significant 
variants, a fact common to traditions of authoritative texts characterized by a 
process of controlled transmission. As we have seen, the great majority of var- 
iants regard minutiae, such as the alternation of scriptio plena and defectiva, 
which can hardly be considered as indicators of genealogical kinship. As to the 
remaining variants that have been taken into account in the present study, not 
all can be considered as equally kinship-revealing: most are probably too weak, 
since they concern particles or other parts of speech that are frequently prone to 
random variation. 

Another problem is represented by the phenomenon of horizontal transmission 
or contamination typical of open traditions, which cannot be properly handled with 
the genealogical approach adopted here. 

Finally, one has to take into account the possibility of collation errors and, 
more generally, all the limits inherent in using works that are now, to say the least, 
dated, namely the collations of K and DR (8 3.2). 

Inlightofallthese considerations, one can understand why previous attempts 
to discover groups or families of biblical manuscripts have been declared mostly 
unsuccessful, and why scholars are generally of the opinion that such classifica- 
tion is impossible.*° 

The problems and limitations mentioned above also emerge from the results 
of the phylogenetic experiment illustrated in the previous section. The tree of 
Figure 9 is not the best tree that can be achieved with MP analysis: as stated in 
Section 3.4, the most parsimonious tree cannot be arrived at, due to the size of the 
matrix and the nature of the data, as well as to the limits in computational power 
of modern computers. Most variants are useless for identifying families of manu- 
scripts, and only a tiny minority can be considered as characteristic. 


50 See for example J.W. Wevers, “A Study in the Hebrew Variants in the Books of Kings," Zeitschrift 
für die Alttestamentliche Wissenschaft 61, no. 1 (1948): 43-76 and Barthélemy, *Les manuscrits 
médiévaux." Cf. on the other hand Sacchi, "Analisi quantitativa." 
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This does not mean, however, that all attempts at stemmatic analysis of the 
medieval biblical tradition are to be judged as doomed a priori to failure, nor 
“that certain individual manuscripts could not be judged as stemmatically con- 
nected to one another,” as pointed out by Goshen-Gottstein.°' The method we 
pursued did return groups of manuscripts divided by their ancestral variants, and 
sometimes even by their characteristic variants: some groupings rest on fragile 
foundations and are likely specious; others, however, seem more than plausible 
and need further study. The phylogenetic reconstruction which we have proposed 
is thus to be considered as a working hypothesis, to be verified by qualitative 
analysis. 

Further research is needed if we want to reconstruct the genealogy of the 
medieval tradition of the HB. First of all, other manuscripts, more ancient or rele- 
vant from a text-critical standpoint, need to be added, as we ourselves have done 
to a certain extent. Other categories of potentially monogenetic variants need to 
be included, such as variants of punctuation and the Massora.?? 

As to methodology, we envision two strategies to improve results. One may 
contemplate, for example, the development of a system of weighting factors, so 
that readings judged as having greater kinship-revealing power (e.g. root substi- 
tutions) are assigned greater weight in building the tree. 

Another strategy would be to establish an order in variation that would better 
reflect the logic of the copying process. Such a strategy could prevent some read- 
ings from being misclassified as ancestral to others (as in the case of readings 
due to homoteleuton being classified as ancestral to readings with intact text) and 
could thus help decrease the number of equally parsimonious trees that can be 
generated. 


5 Conclusion 


The collations of Kennicott and De Rossi represent our primary authorities on trans- 
mission of the Hebrew Bible from the Middle Ages onward. Since no other large- 
scale collations are available, we are obliged, still today, to rely on the efforts made 
by these two 18th-century scholars. Yet the obvious difficulty of dealing with the 
enormous mass of material they gathered has intimidated contemporary scholars 
from exploiting them as a direct source of text-critical and linguistic evidence. 


51 M.H. Goshen-Gottstein, “The Textual Criticism of the Old Testament: Rise, Decline, Rebirth," 
Journal of Biblical Literature 102, no. 3 (1983): 394. 
52 As suggested by Barthélemy, “Les manuscrits médiévaux,” xxvii. 
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Digital technologies and methods offer the scholar of the Hebrew Bible an 
obvious route to pursue, since they are designed precisely to render large amounts 
of data far more easily searchable and processable. In fact, our research was 
carried out on a simple personal computer: it would have been impossible with 
*traditional' methods. Further inquiries into textual history of the Hebrew Bible 
are scarcely imaginable without the use of the same digital tools which have 
become the norm in other fields of textual research, first and foremost New Tes- 
tament studies. The employment of quantitative, computer-assisted methodolo- 
gies, in sum, are indispensable if we want to establish a stemma codicum of the 
medieval tradition and to reconstruct the history of the biblical Hebrew text in the 
Middle Ages. 

In the present project, we have used natural-language processing tools to 
demonstrate how a critical apparatus can be encoded automatically through a 
rule-based parser; then, we processed the resulting encoded variant readings by 
means of phylogenetic analysis in order to explore the medieval textual tradition 
of a biblical book. 

Through this inquiry, we have, we hope, demonstrated that stemmatic rela- 
tionships among Hebrew witnesses do exist and can be traced, and that the skep- 
tical, indeed renunciatory, attitude dominant in this area in the scientific litera- 
ture is not always justified. We believe we have shown that, at the very least, the 
question needs to be reopened and discussed. 

A digital corpus of variant readings of the kind we created here would benefit 
not only stemmatology, but other research areas of the Hebrew Bible as well. 

In the area of textual history, for example, variants of Hebrew codices can 
be used to study the formation of the textus receptus as reflected in most printed 
editions.” 


53 This line of research was carried out in particular by the Israeli school, see above all the works 
of M. Cohen: “Saipan sapan nou mon Dpnwm op ny anon angna n» van" (Ph.D. thesis, 
Hebrew University, Jerusalem, 1973) [Unpublished]; nnp vnvmsb noun nwitp 3372 AX TR” 
*oo3on in unix! PNAN, vol. 1, ed. S. Uriel (Tel Aviv: 27, 1979), 42-69; mmwyn IMITI mo mp" 
“Oman mn DRIP T ana vopon bw in NWD gp vy 1(1980): 123-82; *rroinonpn onn? 
1488 miwn w'io 0151 — Down Tnn bw nnvsan mmannan DYRT spnn 0107 5v in nn ADD 
Pex aa novus 5v, vol. 18-19 (Ramat Gan: Bar Ilan University Press, 1981), 47-67; now inn” 
*3"nm bw mvonn nma inrns qp nm mona in mam spna y, vol. 2, ed S Uriel (Ramat 
Gan: Bar Ilan University Press, 1981), 229-56, and J.S. Penkower: nnn nmns on 72 apy” 
*mbovnn masapnn (Ph.D thesis, Hebrew University, Jerusalem, 1982) [Unpublished]; “Rabbinic 
Bible," in Dictionary of Biblical Interpretation, vol. 2, ed. J.H. Hayes (Nashville: Abingdon Press, 
1999), 361-64; *The Development of the Masoretic Bible," in The Jewish Study Bible, ed. A. Berlin 
and M.Z. Brettler (Oxford and New York: Oxford University Press, 2014), 2077-84. 


Digitizing Kennicott's Collation of the Hebrew Bible — 329 


For textual criticism, on the other hand, quantitative methods could explore 
the possibility of a pre-Masoretic origin of the medieval variants, by counting and 
weighing the common readings between Hebrew codices and ancient versions of 
the Hebrew Bible.** 

Moreover, as suggested by Goshen-Gottstein, data from medieval manuscripts 
can be used to study the copying process in the Middle Ages, such as scribal habits 
and the genesis of common copying errors.” 

In the area of codicology and paleography, a distributional study of variants 
would prove useful: given that the various ethno-geographic families (Ashkenazic, 
Italian, Sephardic, etc.) are often characterized in terms of the extent to which 
they differ from the textus receptus, quantitative criteria would serve as an auxil- 
iary tool to classify manuscripts whose character is dubious or unknown.?* 

Finally, a digital corpus containing thousands of variants would of neces- 
sity benefit the field of linguistics as well, as it would permit a far more extensive 
investigation, for example, of variant spellings and orthography. 

Clearly, the data found in the 18th-century collations, and in particular in 
Kennicott, cannot be considered definitive. Kennicott's apparatus contains inac- 
curacies and lacunae which need correcting and integrating if our desire is to 
access a corpus of variant readings that is complete and truly reliable. But in the 
absence of an entirely new collation of Hebrew witnesses that could overcome 
such deficiencies, the digitization of the data in the classic collations makes an 
excellent starting point for the gathering of new material from which to acquire 
new insights and to foster new research perspectives in Hebrew Bible studies. 


54 Asurvey on these studies can be found in Barthélemy, “Les manuscrits médiévaux,” xix-xxvii. 
These are: J. Hempel, “Chronik,” Zeitschrift für die Alttestamentliche Wissenschaft 48 (1930): 
187-206; J. Hempel, “Innermasoretische Bestátigungen des Samaritanus," Zeitschrift für die 
Alttestamentliche Wissenschaft 52, no. 1 (1934): 254-74; Wevers, “Hebrew Variants in the Books 
of Kings”; M.H. Goshen-Gottstein, “Die Jesaiah-Rolle und das Problem der hebräischen Bibel- 
handschriften," Biblica 35, no. 4 (1954): 429-42; H. Gese, “Die hebräischen Bibelhandschriften 
zum Dodekapropheton nach der Variantensammlung des Kennicott,” Zeitschrift für die Alttesta- 
mentliche Wissenschaft 69, no. 1-4 (1957): 55; and Sacchi, “Analisi quantitativa." 

55 M.H. Goshen-Gottstein, *Hebrew Biblical Manuscripts: Their History and Their Place in the 
HUBP [Hebrew University Bible Project] Edition," Biblica 48 (1967): 49. 

56 As has been done, for example, by Penkower, see J. Penkower, mann jn mnn bw "now 1-103” 
*(3p r-an3) »&my ja bwn nv mwyn, yan 58, no. 1 (1988): 47-74, and “A Sheet of Parch- 
ment from a 10th or 11th Century Torah Scroll: Determining Its Type among Four Traditions (Ori- 
ental, Sefardi, Ashkenazi, Yemenite)," Textus 21, no. 1 (2002): 235-64. 
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Figure 9: Strict consensus tree ofthe textual tradition of Qohelet according to Kennicott’s 
collation. 
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Appendix 


In the following table the characteristic variants are listed. We used the consist- 
ency index (CI) in order to establish whether a variant can be considered char- 
acteristic and kinship-revealing. In phylogenetic analysis the c1 is the measure 
of how much a given character fits a given tree topology: the maximum value is 
one and means that the character is perfectly consistent with the tree, because it 
changes only once; values less than one mean that more changes occur.” Applied 
to variants, a CI equal to one basically means that a variant occurs in one position 
only in the tree, otherwise it is found in other branches of the textual tradition. 

In the table the variants are sorted by cl in descending order. Only witnesses 
with variants based on cı between 1 and 0.500 are shown. 


Table 3: List of characteristic variants. 


witnesses place variant CI 
2AK4 2AK30 6:3 bow 1.000 
6:5 vnv > wawn 1.000 
8:9 nwyi nvynn 0.500 
E271 E275 12:1 "» "—n 1.000 
2AK76 2UK177 2:26 nym- 1.000 
2AK77 2AK107 4:4 IRS - 1.000 
2:14 ny —> NYT 0.400 
2AK77 (2AK107) 11K180 5:6 »o- 0.500 
5:5 5x1 > IN 0.333 
2AK80 2AK151 7:21 bb — 53 1.000 
2AK95 3UK152 2:15 UN — UN DI 0.667 
3:19 mmnpo- 0.500 
6:9 nn T5nn 0.500 
8:4 nobv — podwi 0.333 
9:9 Ton unn 0.333 
21K225 21K226 2:4 "npoi  ^nyon 0.500 
2:10 55n— 553 0.333 
3AK18 1ASS882 12:9 ]pn — pm 0.500 
2:5 "npon — nyo 0.333 
9:9 anw 59 > - 0.333 
12:9 Pn > pom 0.333 
3AK212 3AK384 8:9 nwyn nvynn 0.500 
5:18 D’DIN > T1191 moon 0.333 


57 See Baum and Smith, Tree Thinking, 93 ff. 
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Table 3 (continued) 


witnesses place variant CI 
2AK117 2AK187 6:10 Nb — Nb 0.500 
1AK201 11K224 5:1 nwyn nvynn 0.500 

811  —- 0.333 
4SK173 3IK240 12:8 Son dan — moan dan san ban 0.500 
4IK213 3UK218 7:12 bya — MYI nr 0.500 
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Avi Shmidman 
Automatic Identification of Biblical Citations 
and Allusions in Hebrew Texts 


Abstract: The Hebrew text of scripture forms a foundational piece of shared 
culture for Hebrew writers. Hebrew texts through the ages, whether medieval or 
modern, virtually all draw upon the Hebrew Biblical text in one way or another. 
The identification of these Biblical citations and allusions is generally a primary 
prerequisite for the analysis or publication of the text. The automation of this 
process has long been a desideratum among scholars; however, for many years, 
an effective solution proved elusive. 

To be sure, exact citations of Biblical prooftexts are trivial to identify: they are 
generally introduced with a stock phrase such as “as it is written", and the exact 
reproduction of the Biblical texts allows for a quick and easy lookup to deter- 
mine the source of the quote. However, the overwhelming majority of Biblical text 
reuse is not in the form of exact citations, but is rather found as a reworking and 
adaptation of the Biblical text, adjusting the text to fit the new context, while still 
maintaining a set of linguistic cues in order to keep the connection to the origi- 
nal verse. For instance, a case of Biblical allusion will often be based upon the 
reuse of a set of lexemes from a given verse, even as those lexemes are adjusted 
in terms of their tense, person and gender. Additionally, prefixes and suffixes 
may be added, dropped or altered; additional words may be interpolated; and the 
order of the lexemes may be altered as well. 

In this chapter, I present a new algorithm for the identification of Biblical allu- 
sions, designed to address all of these challenges. 


Keywords: Hebrew scripture, natural language processing, approximate match- 
ing, intertextuality 
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1 The Challenge 


Hebrew Scripture forms a foundational piece of shared culture for Hebrew writers. 
Hebrew texts through the ages, both medieval and modern, have often drawn 
upon the biblical text in one way or another. The identification of these biblical 
citations and allusions is a primary prerequisite for the analysis or publication of 
the texts. 

To be sure, exact citations of biblical verses are fairly trivial to identify, using 
standard string-searching algorithms.’ However, the overwhelming majority 
of biblical text reuse is not in the form of exact citations. Rather, when reusing 
biblical material, authors of Hebrew texts will generally rework the biblical 
text, adjusting it to fit the new context. The lexemes of the original verse may 
be altered in terms of their tense, person and gender; individual words may be 
added, dropped, or altered; prefixes and suffixes may be altered; and the order of 
the words may be inverted as well. To take one example, consider the following 
line from SY. Agnon’s novella, “In the Prime of Her Life": Avan by avm^ns prmnm 
(*my mother gathered her strength and sat up on the bed").? This line is patterned 
after Genesis 48:2: non Sy au» 5st» pini ("Israel gathered his strength and 
sat up on the bed"). However, the first and third words have been altered from 
masculine to feminine (pinni <- prnnm, and av" <- av), and the second word, 
“Israel,” has been swapped out for “my mother." 

Furthermore, biblical allusions can be very short, sometimes consisting of 
no more than two words.’ For example, Agnon writes: 5x5 ma nno» (“and her 
strength fled like a shadow”).* The simile here derives from Job 14:2: 85) 523 m" 


1 For an overview of exact-match search algorithms, see: Dan Gusfield, Algorithms on Strings, 
Trees, and Sequences (Cambridge: Cambridge University Press, 1997), 5-69. To be sure, when it 
comes to the biblical text, there is the additional complication of non-normative defective and 
plene orthography. Citations of verses in later texts often adopt normalized spellings, and thus, 
in order to locate all exact citations, these orthographic variants must be taken into account. For 
more regarding this point, see below, note 9. 

2 SY. Agnon, Al Kapot Ha-man’ul (Jerusalem and Tel-Aviv: Schocken, 1998), 7 [Hebrew]. 

3 Theoretically, one might also consider single words to constitute biblical allusions, especially 
in the case of hapax legomena. However, as Shulamit Elizur has compellingly argued, individual 
biblical words should never be considered biblical allusions in and of themselves. (Shulamit 
Elizur, Hebrew Poetry in Spain in the Middle Ages, vol. 3 [Tel-Aviv: The Open University, 2004], 
355-57 [Hebrew]). As she explains, individual words are the lexicographical building blocks that 
a Hebrew writer uses when formulating a sentence. If a single word could constitute a biblical 
allusion, then writers would never be able to use the word without invoking the said allusion. We 
would thus be left with the absurd situation in which a word would be effectively blocked from 
use, except in the case where a writer desires to forge an allusion to the source verse. 

4 Agnon, Al Kapot Ha-man’ul, 5. 


Automatic Identification of Biblical Citations and Allusions in Hebrew Texts — 337 


Tin» (“he fled like a shadow and could not endure"). This allusion consists of no 
more than two solitary words; and, further, the first word is altered in its morpho- 
logical form (the biblical vuv-inversive form nna% is replaced by the normative 
modern verb ma). 

We now turn to the question of how to design an algorithm to automatically 
identify allusions of this nature. Specifically, we consider how to break down the 
biblical text into minimal representational units which can then be matched to 
corresponding units in other Hebrew texts, such that these matches will capture 
the biblical allusions therein. These minimal representational units are termed 
*hashes," and the process of breaking down a text into these units is called 
*hashing." The non-biblical Hebrew text in which we wish to identify the allu- 
sions is termed the "target text." 

Given the examples of biblical allusions presented above, we might consider 
hashing every two-word lexeme pair in the biblical text. This would ensure, first 
of all, that we catch matches as short as two words. Secondly, because the hashes 
are based on lexemes and not exact wordforms, they would allow us to capture 
any case in which the two lexemes appear together, regardless of how the lexemes 
might have been altered grammatically. Further, in order to allow for allusions 
which delete, add, or change a word in the middle, we could hash every two-out- 
of-three sequence. Thus, given a sequence of three words A-B-C from the biblical 
text, we would prepare three hashes: one for A-B, one for B-C, and one for A-C, 
each time storing the lexemes underlying the two words. We would run the same 
two-out-of-three hash procedure on the target text, and then look for all cases of 
hashes in the target text which have a match in the biblical text. 

Such an approach would certainly capture virtually all of the relevant cases 
of biblical allusions in the target text (high recall). The problem, however, is that 
it would capture an overwhelming amount of irrelevant material as well (low pre- 
cision); for a mere match of a pair of lexemes does not always constitute suffi- 
cient basis for establishing a biblical allusion. Indeed, using this method, almost 
every modern Hebrew sentence matches up with multiple biblical verses, based 
upon lexemes which coincidentally happen to co-occur, and the number of false 
positives quickly becomes unmanageable. To illustrate, we take the first three 
paragraphs of Agnon's aforementioned novella "In the Prime of Her Life," con- 
sisting of 198 words. My manual inspection of these paragraphs reveals 12 bona 
fide biblical allusions. Yet, running the same paragraphs through the proposed 
lexeme-pair hashing method turns up no less than 4,937 matches, connecting 
those 198 words to over 3,400 different biblical verses. This ratio, consisting of 
several hundred false positives for every true positive, clearly renders the algo- 
rithm useless, with a precision of less than 196. Furthermore, considering that 
the Bible only contains some 23,000 verses, it emerges that this algorithm thus 
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finds “allusions” to over 10% of the Bible's verses within these three solitary par- 
agraphs - clearly an absurd result. 

One might suggest that this overflow can be obviated by defining a set of “stop 
words" — very frequent words which should be ignored when hashing both the 
biblical text and the target text. We thus assembled a list of the 25 most common 
biblical words; this list includes words such as ns (accusative marker), iw (“that”), 
N5 (“not”), and 5x77 (“Israel”). We defined these 25 words as stop words and reran 
the hashing procedure as above on the Agnon text. Although this did help some- 
what, the number of false positives was still overwhelming: 2,624 matches in over 
1,000 distinct verses. Furthermore, the eliminated stop words are often crucial 
in identifying allusions; for instance, without the word x} (“not”), core biblical 
phrases such as "ann s> (“thou shalt not covet”) or 233m N5 (“thou shall not steal”) 
would go unnoticed. Thus, the use of stop words does not provide an adequate 
solution. 

How then can we allow for a sufficiently high degree of matching flexibility, 
without incurring a flood of meaningless results in return? 


2 What Makes an Allusion? 


The key intuition in solving this challenge is that our goal is not to find all 
instances of parallel phrases between the Bible and the target text, but rather to 
find allusions within the target text to specific biblical verses. In order for a given 
phrase to be considered an allusion to a biblical verse, it is not sufficient for it to 
incorporate a biblical phrase; rather, it must include textual material which con- 
nects it to a particular biblical verse. Thus, when it comes to Agnon’s ‘AX prnnm 
nonn by awm phrase, referenced above, a direct connection is formed to the verse 
in Genesis, because there are no other verses in the entire Bible which demon- 
strate that sequence. In contrast, were a target text to contain phrases such as ny 
nmm (“cities of Judah”), bxaw witp (“Holy One of Israel") or "iN ^n (“as I live”), 
no such connection would be formed. These three phrases are certainly biblical; 
each one of them appears over 20 times in Hebrew Scripture. However, precisely 
because of their high frequency of occurrence, their reuse in later Hebrew texts 
would not point the reader to any specific biblical verse, and thus they would not 
constitute instances of biblical allusion. 

In this sense, we may posit that the best predictor as to whether a match 
constitutes a significant biblical allusion or not is the rarity of the phrase. If a 


5 This claim is underscored and explicated by Elizur, Hebrew Poetry, 3:358. 
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sequence that is unique within the biblical text recurs in the target text, then that 
phrase has a high likelihood of representing a biblical allusion. 

Given this premise, we can severely cut down the number of hashes that we 
include in our search infrastructure, by discarding any hash which occurs in more 
than one verse. To be sure, in practice, the criterion of absolute uniqueness would 
be too restrictive. In many cases, a given verse or set of verses is requoted within 
Scripture itself. For instance, the decalogue appears in Exodus 20 and repeats 
again almost verbatim in Deuteronomy 5. Similarly, regarding 2 Samuel 22 and 
Psalm 18 (David's song); Psalms 14 and 53; and more. In such cases, the repeti- 
tion does not indicate that the phrases contained therein are generic; rather, the 
phrases of the decalogue would still point the reader to specific biblical content. 
Therefore, we would not wish to block allusions to this material just because of 
the recurrence. In other cases, a given phrase appears a few times in the space of 
one pericope, such that a reuse of the phrase could be said to allude to that per- 
icope overall. For instance, in Genesis 23 the words 717 (“bury”) and nn (“your 
deceased") appear together three times within the chapter; nevertheless, because 
they are all used within the same biblical story in the same biblical chapter, a 
reuse of that phrase could be said to point the reader specifically to this section. 
In order to make allowance for such cases, we will slightly relax the uniqueness 
requirement, allowing all hashes which occur no more than three times within 
the biblical text (henceforth: *semi-unique hashes"). Nevertheless, as we will see 
in the next section, a further complication arises. 


3 Beyond Lexemes 


Based on the foregoing, it might seem that it would be sufficient to simply hash 
all semi-unique lexeme pairs within the Bible. However, the reality turns out to 
be more complex. Our assertion that biblical allusions must be based on hashes 
which occur no more than three times in the Bible gives rise to a further question: 
what exactly are we counting? Is it the pair of lexemes in their precise order of 
appearance which must be semi-unique? Or in any order? Or is it the exact pair of 
words that must be semi-unique? 

In truth, the semi-uniqueness requirement can be satisfied in any one of 
these ways (and many more), and it is critical to consider them all. To illustrate 
the point, let's examine the first half of Psalms 89:41: rno73 52 nyaa (“You have 
breached all his walls"). The combination of the words nv?» and rmm is unique 
in the Bible, and the reappearance of those specific words together in a target 
text would likely constitute allusion to this verse, even without the word 53 in 
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the middle. In contrast, the lexeme pair of pna (“to breach") and 173 (“wall”) 
occurs several times in the Bible in various other conjugations, such as 773 y? 
(“one who breaches a wall," Ecclesiastes 10:8) and i773 yna (“breaching his wall,” 
Isaiah 5:5). Therefore, if a target text were to use some other conjugation of the 
lexemes p18 and 773 (e.g., TMNT n5), there would be no reason for that word 
pair to trigger an allusion specifically to the Psalms verse. For the pair r7 mya, 
then, uniqueness is achieved by focusing upon the particular word forms, rather 
than the lexemes. On the other hand, if we look at the full phrase vr^7 52 praa, 
and we examine the sequence of underlying lexemes (93,75 ‚P75), we find that it 
is completely unique across the Bible. Therefore, if a target text were to use those 
three lexemes together, it would likely constitute an allusion to the Psalms verse, 
even if the words were morphologically altered (e.g., Tm 9» `na). The subtle 
addition of the lexeme 7» provides the extra hint necessary to link a new usage of 
773+p75 to the verse in Psalms. 

What emerges, then, is that it is not sufficient to simply hash the lexemes, 
because often the semi-uniqueness which creates a connection to a specific bibli- 
cal verse is achieved only through the exact word forms. On the other hand, if the 
lexemes are sufficiently unique to induce an allusion, then we would not want 
to suffice with a hash of the word forms; rather, we would want to use a hash of 
the lexemes, in order to capture matches in which the words are morphologically 
altered. Furthermore, it is not always a matter of choosing either lexemes or word 
forms; in many cases, the requisite semi-uniqueness is found specifically in a 
combination of the two - e.g., a pair consisting of one lexeme and one word form. 

The key principle is that, for any given biblical phrase, we wish to find the 
most generic representation which also satisfies the semi-uniqueness require- 
ment. In addition to lexemes and exact word forms, we also consider the word 
forms with prefixes removed, which we will term “base words." Thus, if the 
lexemes underlying a given phrase are not sufficiently unique, we don't neces- 
sarily need to limit matches to the verbatim phrase; rather, we can first fall back 
to the base words. For example, consider the phrase N nnnwa (“with joy and 
gladness,” Psalms 45:16). Although the full phrase 731 nnnwa is unique in the 
Bible, the lexeme pair is not; the combination of the lexemes nnnw (“joy”) and 
5» (“gladness”) is quite frequent in the Bible, and their co-occurrence would not 
constitute an allusion to this verse. However, this does not mean that we need 
to limit matches to the exact phrase "71 nhnwa. Both of the words in this phrase 
contain prefixes (a and 1). An examination of the base words (53 ,mniv) demon- 
strates that this is the only verse where they co-occur, regardless of prefixes. We 
can thus define the hash as: “base word ninnw plus base word 53,” in order to 
capture reuse cases such as 51 minawi; Sam minnwm; waw mnnvn, and so on. 
All of these would constitute valid allusions to the verse. 
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We also consider whether the semi-uniqueness holds only given the specific 
order, or regardless of order. If the latter is true, then we catalog the hash as an 
order-neutral hash, enabling it to match a wider set of allusions. For instance, the set 
of lexemes underlying the three-word phrase nopan nwa vn» (Song of Songs 5:13) 
does not appear anywhere else in the Bible, whether in that order, or in any other 
order. In such a case, there is no need to store the order-sensitive hash, because the 
order-neutral hash is sufficient to identify all relevant instances in the target text. 

Finally, we apply the same procedure with regard to hash length as well. A 
given sequence of two or three lexemes may not be sufficiently unique, but if 
we extend the sequence and add one or two subsequent lexemes, we may then 
arrive at a semi-unique hash. As above, we aim to hash it as the shortest possible 
sequence, for maximum applicability; but when the short sequence is not suffi- 
cient to secure the connection to the verse, we fall back to longer sequences. 

Naturally, applying stricter definitions in any one of these areas will allow 
more flexible definitions in a different area. For instance, a short sequence is more 
likely to satisfy the semi-uniqueness requirement if its components are defined 
strictly in terms of their full forms in their specific order, while longer sequences 
are more likely to accommodate order-neutral lexeme-based hashes. We wish to 
store all such hashes, because each one provides flexibility from a different angle. 


4 Implementation 


We now formalize the aforementioned ideas into a step-by-step algorithm, with 
specific implementation details. We start with the procedure for hashing the bib- 
lical text: 

— For every word in the biblical text, we store three separate representations: 
the full vocalized word; the vocalized word without prefix; and the lexeme 
underlying the word. All such unique word representations are assembled in 
a single linear list, and an identifier is assigned to each one. In total, this list 
numbers under 100,000 items; thus, we can store any word representation 
within 17 bits. 


6 In order to determine the precise lexeme and prefix segmentation of each word in the biblical 
text, we use the ETCBCAB tagged database of Hebrew Scripture, generously provided by the Eep 
Talstra Centre for Bible and Computing, VU University Amsterdam. See: W.T. van Peursen, C. 
Sikkel, and D. Roorda, Hebrew Text Database ETCBC4b, DANS, 2015, accessed January 25, 2022, 
https://github.com/ETCBC/bhsa/tree/master/source/4b. 
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— We now iterate across all words of the biblical text. For each word, we con- 
sider a set of hashes starting at that word position, with sequence length 
ranging in length from two to five (without crossing verse boundaries). 

- For sequences of length two to three, we allow one word skip (two out of 
three, or three out of four), and for each configuration we compute all per- 
mutations of the three aforementioned word representations of each word 
within the sequence. Thus, for each starting word, we compute 18 two-word 
hashes, and 81 three-word hashes. To illustrate the process, Figure 1 rolls 
out all two-word hashes computed for a sample three-word biblical phrase. 

-  Forsequences oflength four to five, we allow two word skips (four out of six, 
or five out of seven). For these longer sequences, we don't compute hashes 
for all possible permutations of the three word representations. Rather, 
for each configuration, we compute three hashes: one based solely on the 
lexemes, one based on the base words; and one based on the exact words. 


Hashing Isaiah 24:17: n9) NNI) TNA 
All 2-word hashes 


Words 2,3 


Words 1,3 Words 1,2 
"NNI TI” 
nno T” 
[nn9) "Tna" 
"nn3»" ,Th2.. 
nno ,Tn9. 


[nn$) ,Tn9 


"nno 109] 


nno ,Un3) (8) 


Figure 1: Demonstration of our hashing procedure. Quotation marks indicate an exact word; an 
underscore indicates a base word regardless of prefix; and square brackets indicate a lexeme. 
When considering words 1, 2 (right column) and words 1, 3 (middle column), all hashes satisfy 
the semi-uniqueness criterion, and the lexeme-based hash (9) is sufficiently generic to cover all 
of the other cases. Thus, we need only store hash (9). In contrast, when considering words 2, 3 
(left column), we discard hashes (5), (6), (8), and (9), because they are not sufficiently unique. 
Of the remaining hashes, (3) and (7) are sufficiently generic to cover all of the remaining cases. 
Hash (3) pairs the exact form of word 2 with the lexeme of word 3, while hash (7) pairs the 


lexeme of word 2 with the exact form of word 3. 


7 The less flexible hashing for longer hashes is motivated as follows. When it comes to sets of 
two to three words, it is often the case that the lexemes or words will coincidentally co-occur 
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— Regardless of length, all lexeme-based hashes are additionally stored in 
an order-neutral manner in a separate hash collection.° 

— Because each hash covers a maximum of five words, and because each 
word representation entails only 17 bits, we can handily store any hash 
in a single 128-bit integer. 

—  Atthis point, having built up our two hash collections of order-sensitive hashes 
and order-neutral hashes, we proceed to prune the collections as follows: 

- First we apply our criterion of semi-uniqueness. For each collection, any 
hash with a count of more than three is removed from the collection.? At 
this point, we are left with some 27 million semi-unique hashes. 

— Next, we remove all hashes which are already covered by a more generic 
hash. For instance, if a given sequence is covered both by an exact-word 
hash and also a lexeme-based hash, we can discard the former. Further, 
if the same lexeme-based hash is covered by both an order-sensitive 
hash and an order-neutral hash, we can discard the order-sensitive hash. 
In such cases, we can of course also discard any hashes of the same 
sequence that are based on a combination of word forms and lexemes. 
This process eliminates over 9096 of the semi-unique hashes; at the end of 


in multiple biblical verses. Biblical allusions of two or three words are thus often quite subtle, 
picking up on a slight uniqueness in one biblical form or another in order to forge the connection 
to the verse. In order to ensure that we capture these particularly subtle references, we consider 
all permutations of all the word representations; this enables us to determine the precise set of 
terms which can form a maximally flexible semi-unique connection to the verse. In contrast, 
sequences of four or more words tend to be fairly unique from the start, and allusions of four or 
more words are naturally more explicit and easier to identify. Thus, in these cases, the precise 
determination of the maximally flexible word-representation premutation is less critical. Skip- 
ping the extra permutation calculations for the longer sequences provides significant computa- 
tion savings, due to the exponential growth in number of permutations (for each starting word 
position, there would be 3,645 five-word permutations, given all permutations of the three word 
representations for each five-out-of-seven configuration). 

8 Our working assumption is that when it comes to allusions with altered word order, the exact 
word forms are less relevant, because the phrase is undergoing reformulation in any case. We 
therefore store order-neutral hashes only for the lexeme-based hashes. 

9 Of course, in computing the number of occurrences of vocalized words, it is imperative to 
count orthographic variants as the same word. From the perspective of a biblical allusion, it 
makes no difference whether text is plene or defective (e.g., nT7in vs. niTyin, or nghy vs. Tin), 
or whether an alternate matres lectionis is chosen (e.g., NoD vs. 79m). Thus, when encoding the 
word representations for the hashes, we group together orthographic variants which are pro- 
nounced identically and which fill the same morphological function. We assign a single numeric 
identifier to all variants in each group, so that their hashes will match one another. 
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this stage, we are left with some 2.5 million hashes (2.1 million order-neu- 
tral hashes, and 400,000 order-sensitive hashes). 

— As noted, each of the 2.5 million hashes can be represented in 128 bits. 
We store each hash along with a list of biblical verses (and word posi- 
tions) identifying where the hash occurs. As per our semi-uniqueness cri- 
terion described above, a given hash will always be linked to a maximum 
of three verses, which fits within an additional 128-bit integer, so that the 
entirety of the hashes requires only 80 megabytes. Thus, we can easily 
keep the entire hash collection in memory, allowing for near-instant pro- 
cessing of the data. 


Given this hash collection, we now present the algorithm for identifying these 
allusions within a target text: 


5 


We hash the target text in an analogous manner to that of the biblical text: 
for each word within the text, we consider all sequences of length two to five 
from that starting word position, allowing for word-skips and hash permuta- 
tions as above. 

Because the input text will generally be unvocalized, a given word can admit 
to multiple possible prefix-segmentation possibilities, and to multiple pos- 
sible lexemes. For instance, the word n 170 can be a single word mamn 
(“deserts,” from the lexeme ^z72), or way (“speaking,” from the verbal 
lexeme 727), or 0270 (“from items," from the lexeme 27). Therefore, for 
each word, we consider word representations for all such possible analyses, 
and for each sequence, we compute hashes for all permutations of the word 
analyses. 

If a given sequence contains a lexeme or wordform which is not attested in 
the Bible, then we can discard it immediately, because it will certainly not 
match any of the biblical hashes. 

For all remaining sequences, we compute the corresponding 128-bit hash, 
and we check to see whether that same hash appears in our collection of bib- 
lical hashes. A match indicates a basis for identifying a biblical allusion at 
that point in the text. 


Interactive Online Implementation 


A full interactive implementation ofthe algorithm presented here is freely available 
on the Dicta website: https://citation.dicta.org.il/. The site (pictured in Figure 2) 
allows the user to paste in any Hebrew text, and to immediately receive a visual 
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display of the results, mapping each biblical allusion in the input text to the rele- 
vant biblical verse.'? 


DICTA Analytical tools for Hebrew texts. Nay | ContactUs | OICTA Tools + 
Qer 
§ Result(s) FE Download ©) Mode | Tanakh v Display | Footnotes v Precision (9 


Text Reload Text © 


Citations Display only citations identical to 


0n 52 "n "re "n" vn onm vy n 


NR Urin ND "n "29 ID) vn DYT VYD nj? Nan ooo 


D vn yw ANIT 25n nnn ANONO 


n9» DY n nm PIN? N27 OD} DIN qm 


© Ovo orm myyoh cre 


Figure 2: The Dicta Citation Finder interface, https://citation.dicta.org.il/, showing the input 
target text on the right (A), and identified biblical allusions on the left (B). The user can use 

a slider (C) to adjust the threshold, setting the precision/recall balance as desired. The user 
can request a download of the results in Microsoft Word format (D); in the resulting file, all 
identified allusions are integrated as footnotes. The relevant biblical verses are quoted in full in 
the text of the footnotes, and the relevant words within each verse are highlighted in bold. 


Significantly, the site includes a sliderto allow the user to set a threshold indicating 
how closely the text must match a verse in order to be considered a biblical allu- 
sion. As described above, every hash match indicates a possible basis for a biblical 
allusion, because it represents a case in which the target text uses a set of words 
or lexemes which is semi-unique within the Bible. Nevertheless, some matches are 
more significant than others, and some may seem to be more of a coincidence than 
an intended allusion." In order to address this issue, we apply a scoring system to 


10 For the design and implementation of the interactive front-end, I wish to credit the following 
members of the Dicta team: Joshua Guedalia, Dvora Bloch, Dovid Lipman, Aryeh Sanders, and 
Rivka Sharfman. 

11 One example is the phrase ^» ns *2 (“because she said that"), which appears on the first 
page of Agnon’s “In the Prime of Her Life” (Agnon, Al Kapot Ha-man’ul, 5). This phrase appears 
only a single time in all of Hebrew Scripture (Genesis 29:32), and thus prima facie its appear- 
ance in Agnon might be said to allude to that verse. However, because the phrase comprises two 
frequent function words and one very common verb, it may be considered a coincidental juxta- 
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rate the matches returned by the algorithm. Matches are awarded points for length 
and for rareness of the individual words within the match, and they are penal- 
ized for interpolated or deleted words, for morphologically altered words, and for 
deviation from the order of the biblical text. The slider allows the user to filter the 
matches based upon this score, effectively adjusting the precision/recall balance 
of the results. A higher threshold will ensure high precision - that is, the matches 
displayed will virtually all comprise cases of compelling and significant biblical 
allusions - but it will come at the cost of lower recall, wherein meaningful allu- 
sions might be omitted. On the flipside, a lower threshold ensures higher recall, 
but with more false positives. 

Different use cases will entail different requirements in this regard. A user 
who is focusing on a short text — perhaps a single stanza of a poem - will wish to 
aim for maximum recall, even if it means wading through many less compelling 
matches. Thus, such a user will wish to set the slider to the minimum setting. In 
contrast, a user who is working on a lengthy text will likely prefer a higher thresh- 
old, to immediately identify as many substantial allusions as possible, even if it 
means sacrificing a certain number of less evident cases. 


6 Evaluation 


In order to evaluate the performance of our algorithm, we examine the first chapter 
of the book The Fathers and the Sons by S.J. Abramowitch (also known as *Mendele 


Mocher Sforim").? A domain expert evaluated the chapter (1,929 words in length) 


and identified 77 phrases as valid biblical allusions, alluding to 91 verses. Our algo- 
rithm successfully identified 76 of these 91 verses (recall = 83.596), with a precision 
of 60%." An inspection of the false positives indicates that over half of them are 
extra support verses for the same valid allusions.“ We thus also consider how well 


position of common words, rather than a meaningful allusion. Accordingly, the scoring system 
on the site rates the phrase rather low, such that the phrase will not appear at most thresholds. 
However, for users who wish to examine every reasonable possibility to ensure maximum recall, 
setting the threshold to a low setting will nevertheless turn up this match. 

12 Accessed January 25, 2022, https://benyehuda.org/read/5766. 

13 When evaluating the test set, we used a threshold of 18, equivalent to placing the slider one 
third of the way from the minimum position. We have found that this threshold generally pro- 
vides optimal results. 

14 For instance, regarding the phrase nnb» nwia, the domain expert noted Psalms 35:26, which 
contains those two words in succession. Our algorithm identified the Psalms verse, plus another 
verse with those same two lexemes, albeit with an intervening word (Isaiah 61:7). 
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our algorithm succeeds in identifying the phrases within the target text which com- 
prise valid biblical allusions. Regarding this evaluation, we achieve a recall of 86% 
(66 out of 77), with a precision of 8196 (15 false positives). 

We release our test set as a public domain Zenodo dataset, so that it can be 
used to benchmark future research.” 


7 Limitations and Future Directions 


The algorithm discussed herein assumes that at the basis of any biblical allusion 
lies a set of shared lexemes with the biblical text, which may be morphologically 
altered or reordered in the target text. However, there is another class of bibli- 
cal allusions, which connect to the text not with lexemes, but with paronomasia. 
For instance, consider the phrase jr waw (“he shall apply his staff") used in a 
consolation poem penned by Moroccan sages during the mid-19th century.!ó This 
phrase clearly alludes to Exodus 21:19, jm inaw (“he must pay for his idleness”). 
The connection is cleverly forged by words which sound identical (naw ‚ıwıw), 
but which are actually derived from two different lexemes (the first from vaw = 
"staff," and the second from naw = "inaction"). The present algorithm would fail 
to find such allusions. In the future we hope to extend our algorithm to capture 
cases of soundplay as well. 


8 Conclusion 


The automation of biblical allusion identification presents a formidable chal- 
lenge, because it entails capturing extremely flexible and subtle matches, while at 
the same time demanding discernment to prevent a flood of false positives. This 
chapter presented an algorithm to efficiently address this challenge. The algorithm 
allows instant processing of target texts while utilizing only a minimal amount of 
computer memory, capturing a wide range of biblical allusions both explicit and 
subtle, while limiting false positives to a manageable quantity. 

The significance of this algorithm within the overall framework of Jewish 
Studies in the digital age is two-fold. First, this algorithm allows efficient automa- 


15 Avi Shmidman, "Biblical Allusions Test Set," Zenodo, accessed January 25, 2022, http://doi. 
org/10.5281/zenodo.5059159. 
16 David Ovadia, La communaute de Sefrou, vol. 2 (Jerusalem: self-pub., 1975), 84. 
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tion of processes previously performed laboriously by hand. Jewish Studies texts 
make extensive use of biblical allusions, regardless of genre: whether we are dealing 
with halakhic responsa, liturgical poetry, haskalah literature, or otherwise, biblical 
allusions abound, and those who prepare critical editions of such texts aim to doc- 
ument as many of those allusions as possible in the accompanying critical com- 
mentaries. Until now, the identification of such allusions throughout a text entailed 
substantial time and effort on the part of the editors and publishers of such edi- 
tions. Fortunately, the shift to the digital age has brought along with it algorithms 
such as the one presented herein, which can greatly expedite the process of allusion 
identification. However, the impact of this algorithm upon Jewish Studies research 
is not limited to speeding up existing research patterns. Rather, the ability to auto- 
matically identify biblical allusions within a corpus of unlimited size paves the path 
for new *big data" analyses of Jewish Studies texts. For instance, it is now practical 
to generate statistical characterizations of texts in terms of the density of biblical 
allusions within them, and in terms of the particular biblical books that they tend 
to quote. Similarly, we can now generate statistical profiles for any given author of 
Hebrew texts, based upon the author's predilections in terms of biblical allusions. 
In this regard, the dawn of the digital age opens up new directions for comparing, 
contrasting, and analyzing Jewish Studies texts. 


References 


Abramowitch, S.J. The Fathers and the Sons. Accessed January 25, 2022, https://benyehuda. 
org/read/5766. [Hebrew] 

Agnon, S.Y. Al Kapot Ha-man'ul. Jerusalem and Tel-Aviv: Schocken, 1998. [Hebrew] 

Elizur, Shulamit. Hebrew Poetry in Spain in the Middle Ages, vol. 3. Tel-Aviv: The Open 
University, 2004. [Hebrew] 

Gusfield, Dan. Algorithms on Strings, Trees, and Sequences. Cambridge: Cambridge University 
Press, 1997. 

Ovadia, David. La communaute de Sefrou, vol. 2. Jerusalem: self-pub., 1975. [Hebrew] 

Peursen, W.T. van, C. Sikkel, and D. Roorda. Hebrew Text Database ETCBC4b. DANS. 2015. 
Accessed January 25, 2022. https://github.com/ETCBC/bhsa/tree/master/source/4b. 

Shmidman, Avi. “Biblical Allusions Test Set." Zenodo. Accessed January 25, 2022. http://doi. 
0rg/10.5281/zenodo.5059159. 


Daria Vasyutinsky Shapira, Irina Rabaev, Ahmad Droby, 
Berat Kurar Barakat, and Jihad El-Sana 


Is a Deep Learning Algorithm Effective 
for the Classification of Medieval 
Hebrew Scripts? 


Abstract: In this research, we apply deep-learning techniques to Hebrew paleog- 
raphy to automatically classify and process medieval Hebrew manuscripts. Our 
work is based on contemporary Hebrew paleography (Malachi Beit-Arié, Colette 
Sirat, Norman Golb, Ada Yardeni, Benjamin Richler) that recognizes fifteen sub- 
types of medieval Hebrew script. Automatic recognition of these scripts allows 
to determine the approximate origin and date of writing for not-dated, fragmen- 
tary, and damaged manuscripts. To train the deep neural network, we compile 
a Visual Media Lab - Hebrew Paleography (VML-HP) dataset that contains 537 
high-resolution manuscript page images. The images were hand-picked from the 
SfarData (http:/sfardata.nli.org.il/) dataset; in some rare cases, we also included 
pages from other manuscripts' collections. For testing the model, we define a 
notion of typical and blind test sets. The typical test set consists of the unseen 
pages of the manuscripts used in training. The blind test set, on the contrary, 
consists of pages from unseen manuscripts, thus, providing us with a real-life 
scenario. To train the model, we used patches extracted from the documents' 
pages. To filter irrelevant patches (empty patches or patches that contain dec- 
orations), we developed a clean patch generation algorithm that can generate 
patches containing pure text regions (for the VML-HP dataset, we generated 150K 
train patches). In all the experiments, we trained the network on the training set 
and tested it on both test sets, typical and blind. The objective training function 
was cross-entropy loss and was minimized using the Adam optimizer algorithm. 
The training was performed until there was no improvement in validation loss 
with five epochs' patience. The model with the least validation loss was used for 
testing. 


Keywords: deep learning, digital Hebrew paleography 


In this chapter we are presenting an interdisciplinary project that applies deep 
learning models to classify script types and sub-types in medieval Hebrew man- 
uscripts. It incorporates the techniques and databases of Hebrew paleography 
and (with reservations) Hebrew codicology. This research project is part of our 
ongoing effort to develop algorithmic tools for processing historical documents 


3 Open Access. © 2022 Daria Vasyutinsky Shapira et al., published by De Gruyter. JEA] This work is 
licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
https://doi.org/10.1515/9783110744828-016 
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within the Visual Media Lab at the Department of Computer Science at Ben-Gu- 
rion University of the Negev, Israel.’ 

The ongoing digitization of manuscripts’ collections kept in different librar- 
ies worldwide leads to the increasing availability of more and more volumes of 
manuscripts that once could have been studied only in situ. We have all reasons 
to believe that, within a few years, thousands more manuscripts around the globe 
would be properly digitized and available online. In the case of Hebrew manu- 
scripts, this process is already very advanced, with the Institute for Microfilmed 
Hebrew Manuscripts at the National Library of Israel that already hosts more than 
70,000 microfilms and thousands of digital images. These digitized documents 
constitute more than 90% of the known Hebrew manuscripts. Thus, automatic 
processing, or at least the primary computerized categorization of manuscripts, 
has become the most urgent task of modern Hebrew paleography. 

Hebrew paleography emerged in the mid-20th century, side by side with 
modern Latin paleography, and with the same basic principles. The theoretical 
basis of Hebrew paleography is formulated in the works of Malachi Beit-Arié,” 
Norman Golb,? Benjamin Richler,* Colette Sirat,> and Ada Yardeni.° Contempo- 
rary Hebrew paleography identifies six main-types of scripts: Ashkenazi, Italian, 
Sephardic, Oriental, Byzantine, and Yemenite. Each main script type may contain 
up to three sub-types of scripts: square, semi-square, and cursive. In total, there 
are 15 Hebrew script sub-types. The paleographical classification of the ground 
truth for our project comes from the SfarData dataset," which includes full codico- 


1 The participation of Dr. Vasyutinsky Shapira in this project is funded by Israeli Ministry of 
Science, Technology and Space, Yuval Ne’eman scholarship n. 316784. 

2 Malachi Beit-Arié, Hebrew Codicology (Jerusalem: Israel Academy of Sciences and Humanities, 
1981); Malachi Beit-Arié and Edna Engel, Specimens of Mediaeval Hebrew Scripts, 3 vols. (Jerusa- 
lem: Israel Academy of Sciences and Humanities, 1987, 2002, 2017). See now the complete opus 
magnum Malachi Beit-Arié, Hebrew Codicology - map 517717 (2021). Accessed January 26, 
2022. http://doi.org/10.25592/uhhfdm.8849. 

3 Norman Golb and Omeljan Pritsak, Khazarian Hebrew Documents of the Tenth Century (Ithaca, 
NY: Cornell University Press, 1982). 

4 Binyamin Richler and Malachi Beit-Arié, eds., Hebrew Manuscripts in the Biblioteca Palatina in 
Parma: Catalogue (Jerusalem: Hebrew University of Jerusalem, Jewish National and University 
Library, 2001); Benjamin Richler, Malachi Beit-Airé, and Nurit Pasternak, *Hebrew Manuscripts 
in the Vatican Library," Catalogue. Compiled by the Staff of the Institute of the Microfilmed Hebrew 
Manuscripts, Jewish National and University Library (Città del Vaticano) (2008). 

5 Colette Sirat, Hebrew Manuscripts of the Middle Ages (Cambridge: Cambridge University Press, 
2002). 

6 Ada Yardeni, The Book of Hebrew Script: History, Palaeography, Script Styles, Calligraphy and 
Design (Jerusalem: Carta, 1997). 

7 Accessed January 26, 2022, http://sfardata.nli.org.il/. 
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logical descriptions and paleographical definitions of all dated medieval Hebrew 

manuscripts until the year 1540 (this constitutes about 9596 of the known dated 

medieval Hebrew manuscripts). The SfarData project was initiated by Malachi 

Beit-Arié in the 1970s and it is currently hosted at the site of the National Library 

of Israel. 

Our project is an ongoing research. Our current goal is to develop algorithms 
to recognize Hebrew scripts and their sub-types. The practical applications at this 
stage would include: 

- Determining the date and the area of writing. The paleographical classifica- 
tion of verified manuscripts enables machine learning models to learn the 
features common to each type and sub-type. The trained models can deter- 
mine the sub-type of a query manuscript, which enables estimating the date 
of an undated manuscript or the place of copying. Thus, the application of 
this technology to fragmentary and faked text has the potential to roughly 
estimate where and when it was written. Today this task poses serious chal- 
lenges and often only an experienced librarian or paleographer is capable of 
a plausible guess. There are many forged and incorrectly dated manuscripts, 
on the basis of which historical theories and histories of entire peoples are 
built. The use of a well-trained algorithm will allow us to objectively resolve 
such issues. 

— Already at this stage, we expect the algorithm to be capable of producing a 
rough catalog of a collection of manuscripts where no trained human pale- 
ographer is available. Alongside the effort of the Institute for Microfilmed 
Hebrew Manuscripts to assemble the digital images of all the known Hebrew 
manuscripts, there still are important collections that have not been digitized 
and properly cataloged, such as the big collection of Hebrew manuscripts in 
the Vernadsky Library in Kyiv, currently in the most alarming state of pres- 
ervation. Even the basic catalog made by the algorithm could attract to such 
collections the much-needed attention of the researchers. 

- Identifying important parts of a manuscript, such as colophons, owner's 
notes. These additions to a manuscript are often written in a different script 
sub-type. Identifying them allows a researcher to recognize the date, place of 
copying, name of the scribe, etc. 

— Tracking the movement of scribes, scholars, and communities over time 
through script and/or hand similarities. 

— When the algorithm is further trained to recognize specific words, we would 
apply itto the biggest manuscript collections, such as the Firkowicz collections 
kept in St. Petersburg, collections ofthe Bibliothéque nationale de France, and 
others. This will allow us to take a closer look at some intriguing and fascinat- 
ing but extremely complicated objects of research, when pieces of information 
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about them are scattered in the libraries around the globe. To take just one 
example, we could learn more about the Jews of Magna Graecia, with their 
physical and social mobility and intricate history. The relevant manuscripts 
from different libraries’ collections can be identified, brought together, con- 
nected, and sorted out with the help of machine learning. 

- Another possible application at this stage includes research of little-known, 
challenging, and often mysterious marginal Jewish communities, such as 
Georgian, Bukharan, Mountain Jews, about whom little is known today and 
whose history remains to a great extent legendary. History and works of 
the Jews of the Kingdom of the Two Sicilies and the Jews of Malta (to whom 
belonged the famous kabbalist Abraham Abulafia) before the expulsion by 
the king of Aragon, is another example of a potential application of the 
algorithm. 


There are several ongoing projects in the research of the Hebrew manuscripts that 
complement ours; the most important among them are the Friedberg Genizah 
Project with its Cairo Genizah site? and the Judeo-Arabic corpus,? the eScripto- 
rium," and the Haifa Project for Research on the Dead Sea Scrolls." There is 
also a very promising project at the Bar-Ilan university that works on building 
Hebrew manuscript metadata records and is focused on the manuscripts dated 
after 1540 - i.e., later than the classical Hebrew paleography." Similar efforts to 
train an algorithm to recognize script types and build a web database exist in 
Latin paleography,P with its database.!^ A recent deep learning method? studies 
the impact of varying patch sizes on the performance of writer identification 


8 Accessed January 26, 2022, https: //fjms.genizah.org/. 

9 Accessed January 26, 2022, https://ja.genizah.org/Home.aspx?isDoubleLogin- False. 

10 Accessed January 26, 2022, https://www.escriptorium.uk/. 

11 Accessed January 26, 2022, http://megillot.haifa.ac.il/index.php/en/. 

12 Gila Prebor, Maayan Zhitomirsky-Geffet, and Yitzchak Miller, “A New Analytic Framework for 
Prediction of Migration Patterns and Locations of Historical Manuscripts Based on Their Script 
Types," Digital Scholarship in the Humanities 35, no. 2 (2020): 441-58. 

13 Florence Cloppet et al., “Icdar2017 Competition on the Classification of Medieval Handwrit- 
ings in Latin Script," 2017 14th IAPR International Conference on Document Analysis and Rec- 
ognition (ICDAR), vol. 1 (IEEE, 2017), 1371-76; Linda Studer et al., “A Comprehensive Study of 
Imagenet Pre-training for Historical Document Image Analysis," 2019 International Conference 
on Document Analysis and Recognition (ICDAR) (IEEE, 2019), 720-25. 

14 Accessed January 26, 2022, http://www.digipal.eu/. 

15 Akshai Punjabi et al., “Writer Identification Using Deep Neural Networks: Impact of Patch 
Size and Number of Patches," 2020 International Conference on Pattern Recognition (IEEE, 2020), 
3065—68. 
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for modern handwritten documents. Their results expose that the performance 
depends on the patch size and for each dataset a different patch size gives the 
best performance. Arabadjis et al.'6 classify the hands who wrote a given set of 
historical Byzantine Codices using manually designed features for matching a 
similarity score. 

In our project, we built a medieval Hebrew manuscripts dataset, Visual Media 
Lab - Hebrew Paleography (VML-HP). The VML-HP dataset includes 500 pages 
labeled with 15 script types. To our best knowledge, this is the first publicly avail- 
able Hebrew paleographic dataset. Currently, the dataset can be downloaded 
from https://www.cs.bgu.ac.il/-berat/. To provide a common baseline for algo- 
rithm assessment and comparison, we supply the partition of the VML-HP. The 
dataset is split into training and two test sets. The first test set, the typical test set, 
consists of unseen pages of documents present in the training set. The second, 
blind test set contains unseen manuscripts and imitates a real-life scenario. We 
present a case study for script type classification on the introduced dataset. We 
introduce a homogeneous style patch extraction method, where each patch con- 
tains a fixed number of lines. We also compare several established deep learning 
classification models and preprocessing methods. The obtained results show that 
there is big room for improvement on the blind test set, whereas the typical test 
set is an easier problem. Currently, we are working on exploring more advanced 
deep learning architectures that can capture fine-grained features of the Hebrew 
manuscripts. The fine-grained features are the features that aim to differentiate 
between hard-to-distinguish object classes, such as subtle differences in letter 
forms in different script sub-types. 


1 Method 


We propose to develop a computational tool that can recognize the script sub- 
type of a given Hebrew manuscript. Conventional recognition methods utilize 
handcrafted features, which mainly depend on careful design and expert knowl- 
edge. More advanced recognition methods are based on deep learning and can 
acquire effective feature representations from training data. Deep learning algo- 
rithms are backboned by neural networks inspired by human brain architecture, 
consisting of neurons and synapses among them. A deep learning algorithm is 
organized as a stack of layers, each of which is a collection of feature extrac- 


16 Dimitrios Arabadjis et al., “A General Methodology for Identifying the Writer of Codices: Ap- 
plication to the Celebrated ‘Twins,’” Journal of Cultural Heritage 39 (2019): 186-201. 
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tors, so-called filters (Figure 1). Raster pixel values of a document image patch 
are fed into the network and are transformed into feature maps as they pass 
forward through the layers. Each layer extracts features at a different abstraction 
level. Initial layers detect primitive features such as dots, lines, and curves. Final 
layers combine these features into complex features such as corners, circles, and 
letters. At the final layer, the document image patch is classified into one of the 
script sub-types. The filters are updated according to a measure of the difference 
between the target and the predicted labels. The major drawback of a deep learn- 
ing network is the necessity of a large amount of labeled train data, for example, 
1,000 samples per class. 


Figure 1: Illustration of a deep learning network It consists of stacked feature extraction layers. 
The early layers extract primitive features, and the later layers extract more complex features. 
Figure prepared by authors. 


2 Building the dataset 


SfarData, the database of Hebrew paleography and codicology, completed by 
Malachi Beit-Arié and his team, contains descriptions and classification of almost 
all known dated medieval Hebrew manuscripts; all the manuscripts in the data- 
base were studied in the libraries where they were kept. Malachi Beit-Arié and his 
team met with our team, discussed our project, gave us their full support, and 
allowed us to use their database in its entirety. Our team's paleographer, who is 
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herself a student of Malachi Beit-Arié, handpicked digitized pages from the man- 
uscripts described in the SfarData as the raw material for our project. When for 
certain script sub-types we had to add manuscripts not described in SfarData, 
our paleographer picked them in accordance with the classification of medieval 
Hebrew manuscripts as described in SfarData. 

Pages in the VML-HP dataset were extracted from high-quality digitized manu- 
scripts, and we gave first preference to those kept in the National Library of Israel. 
We also used manuscripts from other libraries, first and foremost the British Library 
and the Bibliothéque nationale de France, with their significant collections of digi- 
tized manuscripts available for download. The dataset includes 500 pages in total. 
Table 1 details the distributions of the number of pages per main-type and sub-type 
scripts in the train set and two test sets. 


Table 1: Summary of the VML-HP dataset. 


Main-Type Sub-Type Train Typical Test Blind Test Total 
Ashkenazi Square 16 4 10 30 
Semi-Square 16 3 10 29 
Cursive 16 4 10 30 
Total 48 11 30 89 
Byzantine Square 16 4 10 30 
Semi-Square 16 4 10 30 
Total 32 8 20 60 
Italian Square 16 4 10 30 
Semi-Square 16 4 10 30 
Cursive 16 4 10 30 
Total 48 12 30 90 
Oriental Square 64 14 10 98 
Semi-Square 16 4 10 30 
Total 80 18 20 118 
Sephardic Square 16 4 10 30 
Semi-Square 24 6 10 40 
Cursive 16 4 10 30 
Total 56 14 30 100 
Yemenite Square 24 6 10 40 
Semi-Square 24 6 10 40 
Total 48 12 20 80 


Total 312 75 150 537 
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3 Clean Patch Generation Algorithm 


The VML-HP dataset contains 500 pages that represent 15 script sub-types. The 
ideal solution is to feed whole pages into the network because with larger input 
images, the network can capture more fine-grained features.” However, the input 
image cannot be greater than the size that fits the memory requirements. There- 
fore, we balance this tradeoff by cropping image patches that contain approx- 
imately five text lines, which is a sufficient size for human paleographers to 
classify the script type. Some parts of the pages contain irrelevant information, 
such as decorations, marginal drawings, or noisy background, as illustrated in 
Figure 2. Therefore, we developed a clean patch generation algorithm (https: // 
www.cs.bgu.ac.il/-berat/data/hp. dataset.zip) that generates patches containing 
pure text regions and an approximately equal number of text lines. 

To achieve this, we first calculate a square patch size for each page sxs that 
will include five lines. Then, we extract random patches of size sxs. The size of 
the extracted patches - i.e., the value of s - varies across manuscripts. Therefore, 
to remain consistent with our previous experiments, the patches are resized to 
350 x 350. Examples of such clean patches are shown in Figure 3. 

Calculating the patch size sxs for each page is done by first, extracting k 
random patches of the size equal to one-tenth of the page height, as a patch of 
this size usually includes several text lines. Then the number of lines in a given 
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Figure 2: Example output patches from a naive patch generation algorithm. Some patches 
contain irrelevant features, some contain only a few characters, and others do not contain any 
text. 
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Figure 3: Example output patches from the clean patch generation algorithm. 


17 Mingxing Tan and Quoc Le, “Efficientnet: Rethinking Model Scaling for Convolutional Neural 
Networks,” International Conference on Machine Learning (PMLR, 2019). 
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patch is computed by counting the peaks of the y profile using Savitzky-Golay 

filter. Finally, the desired patch size is given by s - 2x where h is the height of 

the page, n is the average targeted number of lines, and m is the actual average 

number of lines in the k extracted patches. We used n = 5 and k = 20. 
Furthermore, each extracted patch is validated according to the following 

conditions: 

— The foreground area should be at least 20% of the total patch area and not 
exceed 70% of the total patch area. This condition eliminates almost empty 
patches and patches with large spots, stains, or decorations. 

— The patch should contain at least 30 connected components. This condition 
eliminates patches with few foreground elements. 

- The variance ofthe x and y profiles denoted by o, and o,, respectively, should 
satisfy the conditions o, < Tx, o, > T. Assuming horizontal text lines, the var- 
iance of the x profile should be relatively low. During our experiments we set 
T, = 1500 and T, = 500. 

- The following inequality should be satisfied: 


X. 

5 . 
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=: 
- Where vis the number of values in the x profile and P,(i) is the i-th value. This 


condition eliminates the patches with text lines that occupy only a fraction 
ofa patch. 


0.5< 1.5 


4 Results 


We experimented with several convolutional network architectures. In allthe exper- 
iments, we train the network on the training set and test it on both test sets, the 
typical and the blind. We generated 150,000 patches from the training set, 10,000 
patches from the typical test set, and 10,000 patches from the blind test set. The 
patches are generated using the clean patch extraction algorithm described in the 
previous section and are resized to the size of 350x350 pixels. The generated patches 
are equally distributed amongst all of the script types. The classification results are 
evaluated by the patch level accuracy and the page level accuracy. For the page 
level accuracy, the label of a page is computed by taking the majority vote of the 
predictions of 15 patches from the page. 
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4.1 Classifying Into 15 Script Types 


Table 2 shows the accuracy results for classifying 15 script sub-types using dif- 
ferent convolutional networks and compares the results on the typical and blind 
test sets at patch and page levels. As we can see from the results, the typical test 
set patches and pages are easier to classify. The gap in results on typical and 
blind test sets shows that the models are overfitting. The models have seen the 
pages from the typical test set during the training; however, the blind test set 
contains pages from unseen manuscripts. The difference in results shows that 
the models' learned features are specific to the manuscripts and not to the script 
type, like background texture. At nearly all levels and sets, the performance of 
the ResNet50 classifier is consistently higher; however, it does not surpass 40% 
accuracy on the blind test set. The random guess accuracy of 15 classes is 7.6%, 
indicating that the network can extract some script type features and improves 
the random classification accuracy. We can argue that script type classification is 
an expressible function, but the network needs more data to learn this function. 


Table 2: Patch and page level accuracies on typical test set and blind test set using 
different network architectures for classifying 15 script sub-types. 


Patch level Page level 

Typical Blind Typical Blind 
DenseNet 97.97 32.95 98.63 38.36 
AlexNet 91.99 27.03 93.15 28.28 
VGG11 99.16 35.55 100.00 35.63 
SqueezeNet 98.03 30.38 98.63 29.45 
ResNet18 97.07 30.95 98.63 34.25 
ResNet50 99.55 36.15 98.63 39.73 
InceptionV3 94.94 26.41 95.89 26.71 


4.2 Classifying Square and Cursive Script Types 


Table 3 shows the accuracy results for classifying only two script types, square 
and cursive, using different convolutional networks. From a human paleogra- 
pher's point of view, it is almost impossible to make a mistake and mix up square 
and cursive script (while the boundaries between square and semi-square, and 
semi-square and cursive can be blurry). Thus, a good result obtained by the algo- 
rithm in this case indicates that the algorithm learns the correct features in the 
manuscript, which represent the script itself. We can note that the typical test 
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set accuracy is fully saturated, whereas there is still little room for improvement 
at blind test set accuracy. This result strengthens the above argument that more 
samples should be used in the training phase; we see that decreasing the number 
of classes from 15 to two (which increased the number of samples per class), leads 
to higher accuracy. 


Table 3: Patch and page level accuracies on typical test set and blind test set using 
different network architectures for classifying square and cursive script sub-types. 


Patch level Page level 

Typical Blind Typical Blind 
DenseNet 99.85 87.06 100.00 83.72 
AlexNet 99.48 88.01 100.00 91.86 
VGG11 99.93 86.45 100.00 88.37 
ResNet18 96.65 86.85 100.00 87.21 
ResNet50 99.99 90.58 100.00 94.19 
SqueezeNet 98.03 82.45 100.00 86.05 
InceptionV3 99.16 82.06 100.00 20.23 


5 Discussion 


The classification accuracy of the best performing model - i.e., ResNet50 - is 
around 3596. When comparing class accuracies (Figure 4) we found that particu- 
lar classes have an accuracy over 50% - i.e., Yemenite semi-square, Sephardic 
square, Sephardic semi-square, Oriental square, and Ashkenazi semi-square. 
These results indicate that a classification system designed only for these classes 
will have higher page level accuracy, since the page level accuracy is computed 
by the majority vote over the patches from the same page. 

The Byzantine square sub-type has a very low accuracy because it was con- 
fused with Byzantine semi-square (Figure 5). Interestingly, this confusion is not 
mutual because the Byzantine semi-square sub-type was confused with Italian. 
In contrast, the confusion among the Italian, Oriental, Sephardic, and Yemenite 
semi-square sub-types are mutual. The mutual confusions can be due to the pale- 
ographers' ambiguity in the ground truth of semi-square types or to insufficient 
ground truth. 
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Figure 4: Patch level class accuracies on blind test set using ResNet50 network, prepared by 
authors. 
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Figure 5: Patch level confusion matrix on blind test set using ResNet50 network; prepared by 
authors using https://www.cs.ryerson.ca/ -aharley/vis/conv/flat.html (accessed 26 January, 2022). 
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6 Conclusions 


From the paleographic point of view, it would be beneficial to gain insight into the 
features that underlie the class decisions. We are developing a fine-grained clas- 
sification model that can spot the regions taken into account for script type deci- 
sions. In addition, we are collecting and labeling more document page images, 
as the machine learning for a 15-class problem requires around 15,000 samples 
in total. Our algorithm significantly surpasses the random guess accuracy of 15 
classes (7,6%) and this indicates that the network can extract some script type 
features. We can argue that script type classification is an expressible function, 
but the network needs more data to learn this function. When more material is 
brought for comparison and the size of the test, train, and blind sets increases, 
the accuracy of the algorithm will improve. 


7 Our Work and its Place in the Overall Theme 
of Jewish Studies in the Digital Age 


Our project belongs to the field of digital research of manuscripts and historical 
documents. 

The quantity of digitally available manuscripts and documents in different 
libraries and archives is constantly growing and already the amount of material 
available is often more than an individual researcher could process manually. 
In all likelihood, in the foreseeable future, a human researcher will formulate a 
problem and the processing of large amounts of data will be assigned to an algo- 
rithm. For this to be possible, algorithms must recognize, classify, and ultimately 
search through large quantities of unrecognized manuscripts and documents. 
The integration of computer-based techniques can now bring to the manuscripts' 
research the often-missing quality of objectivity, possibility of objective verifica- 
tion of results. It also brings with it the possibility of solving problems that are 
beyond the physical capacities of an individual researcher. 

We use the theoretical framework of Hebrew paleography to train deep learn- 
ing neural networks to classify Hebrew script types and sub-types and our project 
works side by side and complements such ongoing projects as eScriptorium, the 
Friedberg Genizah Project, and more. 
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Joshua Waxman 
Projecting Punctuation From an Interpolated 
Translation and Commentary 


Abstract: Many classical Jewish texts were composed without punctuation, and 
modern punctuated versions are desired. We describe an algorithm we employed 
to create a punctuated digital edition of the Babylonian Talmud, a text composed 
in a mixture of Hebrew and Aramaic. Our approach is to first word-align the orig- 
inal Talmudic text with an interpolated translation and commentary composed 
by Rabbi Adin Steinsaltz. We apply heuristics to identify the governing punctua- 
tion in the commentary text and then project that punctuation along with other 
lexical features. Our results are quite good overall, with both recall and precision 
often in the 9096-9596 range, when compared with an edited punctuated text. 


Keywords: digital editions, projection, alignment, automatic punctuation 


1 Introduction 


Traditional historical Hebrew texts lack punctuation, such as commas and periods, 
which could inform the reader where a phrase or sentence begins and ends. Even 
when readers are able to determine sentence boundaries, they may be unsure of the 
tenor of a sentence — whether it is a question, answer, exclamation, or simple state- 
ment — something that exclamation marks, question marks, and periods can help 
disambiguate. These texts are often unvocalized as well, introducing an entirely 
different set of ambiguities, which couple with the punctuation ambiguities. This 
presents a huge obstacle and learning curve to the novice first approaching these 
texts. 

An example of such an unpunctuated text is the Babylonian Talmud. Indeed, 
most of the Talmudic tractates that appear on Sefaria's website lack punctua- 
tion.' While the Koren printed version of this Talmud has punctuation, the digital 
version does not. We therefore set out to generate a freely available punctuated 


1 See www.sefaria.org. After implementing this first version of this algorithm, I contacted Se- 
faria to offer them the tool and generated corpus, prior to the beginning of the Daf Yomi cycle. It 
turns out that they independently had developed a similar process to extract punctuation from 
the Steinsaltz Hebrew commentary. Following the Daf Yomi cycle, they release each punctuated 
tractate, with human editing to correct egregious errors. 


3 Open Access. © 2022 Joshua Waxman, published by De Gruyter. [COEGZSEEN| This work is licensed under 
the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
https://doi.org/10.1515/9783110744828-017 
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digital edition of the Talmud, which can be useful both for readers and for down- 
stream computational analysis. 


1.1 The Talmud's Concern with Punctuation 


An ambiguity in the opening statement of the Babylonian Talmud illustrates the 
difficulties of ambiguous sentence segmentation and how punctuation can help. 
The first Mishnah on Berakhot 2a begins: 


AAT ANWR mmnavsn qo Ty mmmna DIRI poi ninanv npvn PIWI yov ns mmponmsn 
mn Ty DMs DNN yos I 


(From when does one recite Shema in the evening? From the time when the priests enter to 
partake of their teruma. Until the end of the first watch. That is the statement of Rabbi Eliezer. 
The Rabbis say: until midnight.) 


The Mishnah is concerned with a start time for recitation of the evening Shema, 
and a start time is given. Then, an end time is given, followed by an attribution 
to Rabbi Eliezer ben Hyrcanus. Finally, the Sages disagree with the end time. Due 
to the lack of punctuation, it is unclear whether the answer as to the start time 
is part of Rabbi Eliezer’s statement, or stands apart as an anonymous first Tanna 
(the Tanna Kamma). 

A brayta on Berakhot 2b records Rabbi Eliezer as giving a different starting 
time, “from the time when the day becomes sanctified on the eve of Shabbat,” but 
the Talmud (Berakhot 3a) addresses the contradiction with the Mishnah: 


ien Te '8 185 NUS RN TDS "IT SION ORIN an ads "TTR TOR IT ND 


(The opinion of Rabbi Eliezer contradicts the opinion of Rabbi Eliezer! Either these are two 
tannaim in accordance with Rabbi Eliezer, or alternatively, the first part of the statement is 
not from Rabbi Eliezer.) 


To put it into our own terms, there might be a comma after the word jnarina, in 
which case the attribution of the full statement belongs to Rabbi Eliezer (with a 
resultant contradiction). Alternatively, there is a period after the word jmimmna. If 
so, Rabbi Eliezer did not author the statement in the Mishnah about the starting 
time, eliminating the contradiction. 

This example demonstrates that punctuation can play an important role in 
understanding the Talmudic discourse, and that resolving such ambiguities some- 
times even appears in the Talmudic discourse. Further, it shows possible limita- 
tions of automatic linguistic approaches when compared to manual punctuation 
by a knowledgeable human. Simple syntactic and lexical knowledge are insuffi- 
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cient here — a deep contextual knowledge, in this case, of Rabbi Eliezer's opinion 
as stated in the brayta - is required. 

While usually punctuation aids readers by making the parsing of a text more 
straightforward, this is an interesting case because the Talmud's question relies 
on the ambiguity. Which of the two parses offered by the Talmud should be used 
in a punctuated text? This is an issue shared by translations and commentaries, 
which translate or explain ambiguous terms or phrases based on the ultimate 
conclusions of the Talmud's analysis, where readers must reinject the ambigu- 
ity. Automated punctuated would always select one option. Here, Steinsaltz took 
pains to punctuate in a special way that preserves the ambiguity.” Clearly, this 
punctuating decision is something best performed by a thoughtful human. 


1.2 The Importance of Punctuation 


People who try to read the Talmud are faced with several obstacles and ambigui- 
ties to overcome. Those for whom Hebrew is not their first language must mentally 
translate the text, and possess an extensive foreign-language vocabulary. Talmudic 
statements are often terse, and many details need to be filled in. (See Figure 1 and 
compare the amount of literal and gloss text.) Further, the words in the standard 
Vilna edition of the Talmud lack vowel points, and several point combinations 
are possible. Is the word *nnpw to be read shiqmati (“my sycamore”) or she-qamti 
(*that I have arisen")?? Hebrew is morphologically rich, and one word may be sub- 
divided into several constituting morphemes, with the mental division again based 
on the aforementioned vowels points. Thus, napn could be napn (“orbit”), nap + 7 
(“the coffee”), now 4pn (“her perimeter") and so on, to take an example from.” One 
would select the best meaning based on context. Meanwhile, the Talmud is com- 


2 He wrote: 


‘927 ANWR minus quo p 4namna xd 00131 DNN nyvin PIWI YAW ns pp NYRA 
.myn Ty co as pnm MYON 


By placing a period after both statements that might be ascribed to Rabbi Eliezer, including im- 
mediately before tyr "I 127, “These are the words of Rabbi Eliezer,” he deliberately leaves the 
parsing ambiguous and anticipates the Talmud’s question. In contrast, his regular punctuation 
would be to place a comma before 737, such as on Berakhot 2b, preceding both "sn 3337 and 
Apr ^2 927; and on Berakhot 31b, preceding b8ynw’ »z3 37. 

3 This example is drawn from the Song of Deborah in Judges 5:7, where it is vocalized she-qamti. 
4 Reut Tsarfaty, Amit Seker, Shoval Sadde, and Stav Klein, “What’s Wrong with Hebrew NLP? 
And How to Make It Right," in Proceedings of the 55th Annual Meeting of the Association for Com- 
putational Linguistics, EMNLP, Hong Kong, 2019. 
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posed in a mix of Biblical Hebrew, Middle Hebrew, and Babylonian Aramaic, with 
different vocabulary and syntax, and with frequent code switching between them, 
often at phrase or sentence boundaries. The presence of multiple languages intro- 
duces new ambiguities, so 21 no longer could mean only “speak” or “thing” but 
also *of the son of." People who study the Talmud read a phrase at a time and then 
resolve its meaning, and without punctuation, it can be difficult to know where a 
phrase or sentence ends. Knowledge of phrase boundaries could limit the preced- 
ing ambiguities. Finally, the Talmud is a record of intensive debate, and it can be 
ambiguous whether a statement is a question or an answer. Punctuation can make 
the reading of the Talmud somewhat easier. 

These ambiguities present a problem as well for computer programs. Tsarfaty 
and colleagues present a joint morpho-syntactic parsing framework for Modern 
Hebrew (YAP), as opposed to the typical pipeline model.’ Each ambiguity (vowel 
points, lemmatization, part of speech tagging) is not resolvable in sequence, but 
is only apparent in context. For the aforementioned reasons, the Talmudic text 
is even more difficult to process. Our ongoing work in named entity recognition, 
relation extraction, and discourse analysis of the Talmud, described in part by 
Waxman would benefit from a punctuated text. The process described in this 
article will become an upstream process for future work, enabling processing of 
the actual Talmudic text. 


2 Method 
2.1 Resources 


The William Davidson Talmud is a digital edition of the Babylonian Talmud made 
available at Sefaria." Among its features are three parallel and aligned texts: the 
actual Hebrew and Aramaic Talmudic text (henceforth called “Original Text"); 
Rabbi Adin Steinsaltz’s Modern Hebrew translation and commentary (“Hebrew 
Commentary"), and his English translation and commentary ("English Com- 
mentary"). Some of these texts were previously available in printed form - the 


5 Tsarfaty et al., "What's Wrong with Hebrew NLP?" 

6 Joshua Waxman, "A Graph Database of Scholastic Relationships in the Babylonian Talmud," in 
Proceedings of the Digital Humanities Conference 2019, Utrecht, 2019; Joshua Waxman, “A Graph 
Database of Scholastic Relationships in the Babylonian Talmud," Digital Scholarship in the Hu- 
manities 36, no. 2 (2021): 277-89. Accessed March 1, 2022, https://doi.org/10.1093/llc/fqab015. 

7 Accessed March 1, 2022, https://www.sefaria.org/william-davidson-talmud. 
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Figure 1: A selection of the Hebrew Steinsaltz Talmud, Berakhot 2a. 


Hebrew Commentary in the original Hebrew Steinsaltz Talmud, 1968-2010, and 
the English Commentary in the Noé Edition of the Koren Talmud Bavli. 

The Original Text differs slightly from the standard Vilna Edition Talmud. 
For instance, abbreviations ('^ for z^, 2"x for 72 0N) are expanded, and censored 
passages have been restored. It also differs from the Talmudic text in the printed 
Steinsaltz editions of the Talmud. Those editions contained elaborate punctuation 
as well as Hebrew vowel points, which the Original Text lacks. Also, it draws from 
a variant text, so many words and phrases are different. The center column of 
Figure 1 contains an example of the printed vocalized, punctuated Talmudic text. 

The Hebrew Commentary is an interpolated translation and commentary, as 
shown in the leftmost column of Figure 1. Since Mishnaic / Talmudic Hebrew, 
Babylonian Aramaic, and Modern Hebrew are all Semitic languages, there is 
a significant overlap in their vocabulary. The original and concise Talmudic 
words are included in the commentary as bold text, with smaller nonbold gloss 
text in between providing commentary and elaboration. This gloss is sometimes 
brief and provides a smoother flow. For instance, the original Talmudic text is 


8 See, for instance, Kiddushin 31a, where the Original Text has 731 due to censorship, while Stein- 
saltz's printed text and Hebrew Commentary restore the word %3. Similarly, see Shabbat 135b where 
the Original Text has an entire repeated sentence from the brayta omitted by the printed Steinsaltz 
Talmud and Hebrew commentary. 
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WYRA »npT^wp KIT NIN. In the commentary text, the first word is expanded 
into umwa bw xinn. The appended Hebrew prefix n is the Hebrew definite 
article, where Aramaic does not employ it, and unıvn bw elaborates that the Tanna 
under discussion is the anonymous narrator of the Mishnah. This becomes flowing 
Modern Hebrew text, or a mixture of Modern Hebrew, Hebrew, and Aramaic text. 
Other short interjections include framing the discourse, as a question or answer. 
Other times, the commentary introduces background and Scriptural basis, or elabo- 
rates on unfamiliar concepts, in which case the nonbold commentary text is longer. 

When an Aramaic word or phrase occurs which cannot pass for Hebrew, it 
still appears in bold, but is followed by a Modern Hebrew translation in square 
brackets. Within these brackets, the literal translation of words is printed in a 
Courier font, while any elaboration for the sake of flow appears in the typical 
gloss font. For example, in the figure, the continuation of the sentence under dis- 
cussion is NYRA "np ^Np XIN, which is glossed in brackets as 1nıu Nin 197A] 
SPRA? [1110 sinV ma Pan xin mmmp aa: bx by manon. When Mishnaic 
or Talmudic Hebrew appears as literal text but is sufficiently awkward in Modern 
Hebrew, it is similarly glossed, but in parentheses. Thus, in the Mishnah, *nmxn 
is glossed as (ayw ıran ^na) and pinya is glossed as (232). 

Importantly for our purposes, the Hebrew Commentary contains punctuation. 
Further, the digital edition of the text carries over all of the punctuation and format- 
ting of the original, so we can distinguish between bolded literal Talmudic text and 
nonbolded gloss, gloss affixes to literal text, translations of Hebrew and transla- 
tions of Aramaic, and which translations of Aramaic are literal and which are gloss. 

Our third resource, the English Commentary, is an interpolated commentary 
as well, alternating between bolded literal translated text and nonbolded gloss 
text, with transliterations in italics. Within the Sefaria digital edition, these three 
texts (Original Text, Hebrew Commentary, and English Commentary) are transla- 
tion-unit aligned. This is not a word, phrase, sentence, or paragraph, but some 
logical unit of the text that an editor selected. Sometimes this translation unit is 
an entire paragraph, including multiple statements by multiple speakers. 

Another noteworthy resource is the vocalized Talmudic text produced by 
Dicta's Nakdan project.? Their system produces Hebrew vowel points for unvo- 
calized text. While it works best on Modern Hebrew, by interacting with a human 
editor, they have produced a vocalized Talmudic text for all of the Orders of 
Zera'im and Moed. This digital edition is freely available at http://daf-yomi.com/. 


9 See Avi Shmidman, Shaltiel Shmidman, Moshe Koppel, and Yoav Goldberg, “Nakdan: Profes- 
sional Hebrew Diacritizer," in Proceedings of the 58th Annual Meeting of the Association for Com- 
putational Linguistics: System Demonstrations, 2020. 
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Figure 2: Steinsaltz punctuated text of the Mishnah on Berakhot 2a (right) and a selection 
of the Hebrew commentary (left). 


Sefaria has paired Dicta's vocalized Talmudic text with a punctuation algo- 
rithm similar to the one described in this chapter.'? Before publishing this text 
on their website, it is manually edited to fix errors and add missing punctua- 
tion. They regularly publish each tractate in advance of the Daf Yomi cycle start 
of that tractate, and that vocalized punctuated text is what is presented as the 
Hebrew-Aramaic Talmudic text on their website. We use this edited punctuated 
text as a “Gold Standard" to evaluate our own system's results. 


2.2 Approach 


Within each translation unit of the digital Sefaria corpus, we select the Original 
Text and the aligned Hebrew Commentary, in order to project the punctuation 
from the commentary to the Talmud. 

In our first step, we parse the HTML of the Hebrew Commentary to distinguish 
bolded from nonbolded text, and tokenize the text to separate words and punc- 
tuation. We thus obtain a corpus of tokens, each marked as literal translation or 
gloss. Some of the punctuation marks in this corpus are marked literal, since they 
were bolded along with their adjacent word, but these are not necessarily the 


10 See footnote 1. 
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punctuation marks we want in the final corpus. For instance, in the gloss on the 
Mishnah on Berakhot 2a, consider the commentary text in Figure 2. The bolded 
punctuation after the word Anwn is a comma, but we would prefer the nonbolded 
punctuation mark of a period which appears at the end of the gloss text. There 
are punctuation marks in the gloss text that we would like to utilize as well. For 
example, in Figure 2, after aax) wdn vpn, we would like to obtain one of the 
commas which follow in the gloss text. 

In theory, this Hebrew Commentary, stripped of all gloss words, could serve 
as the entire basis for our punctuated corpus. After all, the bold, literal words are 
the Talmudic text, and it is just a matter of deciding which punctuation marks to 
interject. In practice, there are several reasons this would not suffice. Firstly, the 
version of the Talmudic text in the commentary will occasionally differ from our 
desired text. Thus, in Figure 2, the desired text of the Mishnah is as on the right, 
in which there is a first clause of anwan Tay nowy 7» jn e3N) obn VPA and a 
second clause of anwa TAY n21*9 Tv men "ns oi? Dgan 921. The version in the 
commentary text collapses the statements into ni» 092837 521 OTANI DIN Wp 
Ina mp no» T» qn Tnx. Other differences occur due to censorship - e.g., "13 
vs% for "gentile." Still others include plene vs. deficient spellings of words, cor- 
rections of errors such as pn" for wn, or separation of one word into two (my3 ^N 
vs. m you). Secondly, translation considerations occasionally cause omissions in 
the commentary text. For instance, in Figure 1, the correct Talmudic text is "nn? 
NU"33 mnv, but the bolded commentary text omits the leading 7, because of 
the gloss Hebrew word bw. Thirdly, we would like to project the punctuation onto 
other Talmudic texts available on Sefaria, which have expanded rabbinic titles 
from *' to 21 or ^a, expanded abbreviation such as 5" to 73 Ds, and have intro- 
duced Hebrew vowel points. 

Therefore, our second step is word alignment of the Original Text and Hebrew 
Commentary. These texts were already aligned by translation unit (sentence or 
paragraph), but this is a further alignment. Since we are aligning words that are 
highly similar and occur in the same order across texts, but with gaps due to 
gloss text or textual variants, we use the Needleman-Wunch algorithm. This algo- 
rithm aligns matches in parallel sequences, rewards matches, and penalizes mis- 
matches and gaps (with a constant penalty regardless of gap length). To capture 
matches even with slight spelling differences, our award is based on the ratio of 
overlapping character bigrams of the letters of the two words. 

We actually first select only the bolded, literal tokens of the Hebrew Commen- 
tary, and align just those tokens against the Original Text tokens. We then utilize 
the original word positions to apply the alignment to the full corpus. While uti- 
lizing the unfiltered Hebrew Commentary would have worked because of the gap 
penalty, the alignment could fail in instances such as short phrases surrounded by 
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gloss text. Instead, we utilize the manually produced bolded text, which is more 
reliable. 

Our third step is projection of punctuation across these word-aligned texts. 
Only literal words, and not gloss text or bolded or nonbolded punctuation, have 
been aligned. We loop through the aligned Original Text, and any time there is a 
gap, we consider all of the Hebrew Commentary punctuation tokens parallel to 
that gap. We rank the punctuation tokens as in Table 1. 


Table 1: Punctuation ranking. 


Punctuation Rank 


Comma 


Semicolon 


1 
2 
Colon 3 
Em-dash 4 
5 
6 


Period 


Exclamation Mark, Question 
Mark, Interrobang 


This is because, at times, the intervening text has multiple punctuation marks, 
which can appear in any order. Our heuristic is to select the first occurrence of the 
maximum ranked punctuation mark. Thus, a period would replace a comma, but 
a comma would not replace a period. If we have already encountered a question 
mark, which finishes the sentence, we would not replace it with an exclamation 
mark that appears later in the gloss text, because it is of equal rank. 

Parentheses in the Hebrew Commentary are not projected, because they 
either represent citations or translations of Rabbinic Hebrew to Modern Hebrew, 
which is not punctuation that should appear in the Original Text. Furthermore, 
in searching for punctuation with the highest rank, we ignore all other punctua- 
tion marks which appear inside parentheses, since they do not refer to anything 
in the Talmudic text. Likewise, we ignore brackets, which represent translations 
from Aramaic into Hebrew, as well as most punctuation inside those brackets. We 
don't ignore punctuation immediately preceding the closing bracket, as often the 
commentator places the punctuation there instead of after the closing bracket. 

We also project quotation marks, which are employed in quotations of bib- 
lical verses as well as in quotation of key phrases under discussion from Tan- 
naitic sources mentioned earlier in the text. In the Hebrew Commentary, these 
are non-directional quotation marks, and we would like to project them in pairs, 
so as to have both open and closed quotation marks. We consider two cases. If the 
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quotes begin and end in the gloss text, then they are not relevant to the actual 
Talmudic text and are ignored. If the open quote appears in one gap and the close 
quote appears in another gap, then they enclose the intervening literal Talmudic 
text and are properly projected. We ignore other punctuation marks inside of 
ignored quotation marks. 

We do not model level of certainty in our output corpus. Our heuristic selects 
and projects just one punctuation mark for each gap. In theory, we could have 
projected each occurring punctuation mark and labeled our judgment about 
each - e.g., that a comma followed by a period in gloss text is only 10% likely to 
be authentic and 90% likely to be the period. In future work, we may take up this 
problem. 

The final version of our algorithm works well, but we encountered a few 
challenges during our implementation. The Hebrew Commentary is encoded in 
HTML, with markup tags for different types of text (original words, gloss, Hebrew 
translation). Stripping all HTML or parsing the HTML ourselves provided noisy 
or insufficiently detailed data. We ultimately used the BeautifulSoup4 module 
which allowed looping through each textual element and its (optionally) associ- 
ated tag. For the unpunctuated Original Text, we tokenized by splitting on spaces 
(which kept parentheses and brackets, indicated textual variants) with their 
words, but used NLTK's WordPunctTokenizer to separate off punctuation in the 
Hebrew Commentary. Postprocessing the output was required to merge separated 
characters of interrobang and revert curly single and double end-quotes to their 
original form. We also found that projecting quotation marks alongside other 
punctuation was too complex - edge cases led to spurious punctuation before 
and after the quotation mark. A better process was to identify and store positions 
of pairs of quotation marks, then project the punctuation, and finally to project a 
subset of the quotation mark pairs. The ranking of punctuation marks was sub- 
jective and was repeatedly adjusted after examining the output sentences and the 
particular employment of the punctuation in the Hebrew Commentary." Finally, 
an unfortunate omission of our process is that colons in the Original Text, which 
indicated the end of a sugya or a citation from a Mishnah, have been stripped out. 


11 For example, a statement in a brayta is cited and ends with a period. In the context of the 
discourse, the purpose of the citation is to attack an Amora's position, so the gloss text contains 
a question mark. Which would we project? If we select the question mark, then consider that 
sometimes the brayta is cited for discussion, and the question in the gloss text is simply to antic- 
ipate an Amora's explanation. If so, a period is better. Some of these punctuation pairs should 
be more carefully considered and fine-tuned, rather than ranked, such as the case of an em-dash 
after a period. 
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3 Results and Evaluation 


We have released punctuated Talmudic text at https://github.com/joshwaxman/ 
Punctuated/. It contains punctuation for 37 tractates. Each line in the file cor- 
responds to a translation unit (that is, paragraph) in the Sefaria database. We 
perused the text and the punctuation is quite helpful. 

As discussed above, we compare to a Gold Standard, which is a manually 
edited partial Talmudic text. Table 2 presents a confusion matrix comparing all 
expected punctuation according to this Gold Standard compared with the punc- 
tuation marks generated by our algorithm, tractate Berakhot. For instance, the 
row for “?” corresponds to each time “?” appeared in the Gold Standard, and 
each column indicates how many times a given punctuation mark appeared in 
that position in our generated corpus - e.g., 794 times as the same question mark, 
and two times as an interrobang. The null column indicates how many times no 
punctuation appeared in that position. 

Overall, these results are quite good, with both recall and precision often in 
the 90%-95% range. If we would relax our evaluation, and consider marks on 
the same punctuation level as equivalent, or even the mere helpful existence of 
punctuation as a match, the recall and precision would be even higher. An exam- 
ination of the confusion matrices reveals that the alternatives are often reason- 
able. Quotation marks, projected via a separate process which does not lead to 
confusion with other punctuation, do not appear on the table, but have a recall of 
95.6% and precision of 93.1%. 

While we referred to this above as recall and precision, what we are really 
describing is degree of overlap. While some differences are due to the human editor, 
many others are due to differences in our respective processes. For instance, the 
Gold Standard will often have an em-dash following another punctuation mark, 
we would only project one mark, leading to 8896 recall; we also project many more 
commas where the Gold Standard has none, leading to 8896 precision. We might 
also resolve ambiguities among multiple punctuation marks in the gloss text dif- 
ferently. Additionally, we don't project any punctuation appearing in parentheses, 
but relaxing that restriction increases our recall. Some of our results are indeed 
erroneous. For instance, the commentary often omitted punctuation marks at the 
end of paragraphs, especially where brackets translated the final Aramaic phrase. 
These account for some nulls instead of sentence terminators in Table 2. 
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4 Future Work 


In this work, we projected punctuation from a parallel interpolated translation 
and commentary in a related Semitic language. Therefore, there was significant 
overlap in actual incorporated words and the order in which they appeared. In 
future work, we intend to phrase-align and project punctuation from the parallel 
English Commentary. 

Further, this work is part of a larger project - a digital edition of the Babylo- 
nian Talmud, described in part by Waxman.” The punctuated corpus is useful 
in its own right, but its availability enables various downstream processes. For 
instance, while much previous work was most successful on the (punctuated) 
English parallel aligned Talmudic text, we anticipate better named entity recog- 
nition and relation extraction on the punctuated Hebrew / Aramaic text. Other 
work enabled by the punctuation includes language identification and discourse 
classification (based on machine learning and on a crafted semantic grammar). 

Looking at recent digital humanities work such as those presented in this 
volume, I see a great deal of promise in the new digital age of Jewish Studies. 
Within my own focus of academic Talmudic study, digital approaches can be 
transformative. The depth and type of research that the expert in humanities but 
relative novice in computers can quickly and readily perform has expanded. For 
instance, performing stylometry for authorship identification on an unattributed 
rabbinic commentary is relatively straightforward. 

This is due to several factors. Modern high-level programming languages 
(notably Python 3) are designed with powerful abstractions such as lists, sets, dic- 
tionaries, and list comprehensions so that it is easy for a programmer to accom- 
plish a lot with very little work. There are many well-supported libraries for these 
programming languages (such as, for Python: Natural Language Toolkit, Stanza, 
spaCy, scikit-learn) which have implemented necessary algorithms, so that a pro- 
grammer can simply pass texts into them and extract linguistic analyses. Texts 
are widely and increasingly available in digital form, on Wikitext or on Sefaria, 
and OCR works well on other texts. 

Even for non-programmers, collections and digital tools are made available 
on websites, such as Hachi Garsinan, Dicta, and Shebanq. I often need to look up 
Talmudic variants. In the past, I would see variants in Dikdukei Soferim, or look 
at individual manuscripts in a library. The Hachi Garsinan project, at the Fried- 
berg Jewish Manuscript Society website, gives the full text of several printings 


12 Waxman, “A Graph Database of Scholastic Relationships in the Babylonian Talmud" (2019); 
Waxman, “A Graph Database of Scholastic Relationships in the Babylonian Talmud" (2021). 
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and manuscripts in columnar format, highlighting differences in red, and also 
makes available the digitized image of the page in the manuscript. (They allow 
download of a spreadsheet with the variants for a page.) This allows me to see 
relevant changes across several manuscripts, and double-check that I agree with 
the reading, something that would otherwise have taken much time and effort. 

An extremely useful task for many Talmud scholars is identifying commonal- 
ity between texts. When studying one Babylonian Talmudic passage, it is helpful 
to know of parallels elsewhere in the Talmud, which can provide a fuller picture 
of how the ideas are discussed or allow tracing the development of a sugya - e.g., 
that there was ha'avara, transfer of the passage. It is helpful to find earlier sources 
and later commentaries quoting the passage. This cannot be performed by simple 
string matching, because of paraphrase or other textual variation. Dicta provides 
a tool to perform such analysis, and results have been integrated into other pro- 
jects, such as Sefaria. 

This digital age of Jewish Studies is increasingly open and collaborative, with 
the texts and tools open source or with public APIs, so that projects can build 
upon one another. Alongside advances in big data and deep learning, this col- 
laborative field allows for the advancement of even more sophisticated work, by 
both novices and experts, which differ in number and kind from previous efforts. 
Ilook forward to seeing how this develops further. 
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